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ABSTRACT 

This publication presents assessments of trends in 
the educational achievement of elementary and secondary school 
students. In light of the heightened reliance on achievement tests a 
careful appraisal of recent trends in scores has important 
ramifications for educational policy, This study assesses test score 
trends and offers some insights on the strengths and weaknesses of 
the information they provide. While the 1960s saw a decline in 
achievement scores in grades five and above the decline was primarily 
in areas involving higher order skills rather than basic skills. The 
test score decline ended in the upper elementary grades beginning in 
the mid-1970s. Achievement has been steadily rising; however, 
examination of the data raises qustions as to whether these score 
improvements on some tests have been larger in the more basic skills 
areas than in areas requiring deeper understanding. Minority 
students 9 performance on tests has improved over the past 10 to 15 
years and the gap between black and white students* scores has 
narrowed. Further, Hispanic students have also made gains over the 
past decade, with the greatest improvement being among Mexican 
Americans, Finally, scores have improved in characteristically 
low-achieving urban schools and communities. These and other findings 
are discussed in detail throughout the body of the report and are 
supported by extensive statistical tables and charts. (CO) 
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Except where otherwise noted, dates used in this paper are school years 
rather than calendar years. For example, the results of a test administered 
in the fall of 1979 and the spring of 1980 are both labeled 1979, As a result, 
the dates used here are in some instances a year earlier than those in other 
published sources, This discrepancy is particularly common in the case of 
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PREFACE 



At-thtf^ftquest of the S^tea^ ^ viM m Education, Arts, and Humanities 
of the Senate Committee oas Lt a ntt Human Resources, the Congressional 
Budget Office (CBO) prepaid ate assessment of trends in the educational 
achievement of elementary - and ^condary-sehool students* This volume 
presents the analysis of the Utiles themselves; a forthcoming companion 
volume. Educational Achievement: Explanations and Implications of 
Recent Trends, evaluates many common explanations of the trends and 
discusses their implications for education policy* In accordance with CflO's 
mandate to provide objective and impartial analysis, neither volume 
contains recommendations, 

Daniel Korete of CBO*s Human Resources and Community Develop- 
ment Division prepared the analysis under the direction of Nancy M. Gordon 
and Martin D. Levine, Paul L* Hoots edited the report, Ronald Moore typed 
the many drafts and prepared the manuscript for publication* 

The author thanks the following organizations for providing essential 
data, much of which is unpublished: the National Assessment of Educational 
Progress, the American College Testing Program, the Iowa Testing 
Programs, the College Board, CTB-MeGraw Hill, Science Research 
Associates, the state departments of education in New York, Virginia, 
Texas, North Carolina, Nevada, California, and Illinois, and the school 
districts of Cleveland, Montgomery County (Maryland), and Houston, 

Many individuals contributed in various ways to this work. Particular 
thanks are due H, D. Hoover of the Iowa Testing Program and Lawrence 
Rudner of the Office of Educational Research and Improvement, U.S, 
Department of Education, who provided insightful contributions at many 
stages of the project* Other individuals whose assistance is gratefully 
acknowledged include Robert Cameron and Harlan Hanson of the College 
Board; Douglas Coulson of the Office of Technology Assessment, U*S* 
Congress; Robert Forsyth of the Iowa Testing Programs; Gene Guest of 
CTB/McGraw-Hill; Eric Hanushek of the University of Rochester; Lyle 
Jones of the University of North Carolina; Jackie Woods of the American 
College Testing Program; and Edward M. Gramlich, Jack Rodgers, and 
Roberton Williams of CBO* Kenneth Rubin of CBO also provided particu- 
larly valuable comments* 
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SUMMAEY 



Over the past several years, the educational achievement of American 
students has become a focus of intense public discussion and has led to a 
serious reexamination of schooling in America, A number of developments 
have contributed to this concern, including a substantial decline in test 
scores in the 1960s and 1970s, the weak performance of American students 
relative to their peers in some other countries, and the large gap in average 
test scores between some minority groups and nonminority students, More 
positive trends, though significant, have gained less notice- -for example, 
the end of the overall achievement decline in the 1970s, a subsequent upturn 
in avernge %mrm^ and recent gains of black and Hispanic students relative 
to nonminority students* 

With the growing concern about public education has come an increas* 
ing reliance on achievement tests as indicators of the performance of 
students and schools* This trend has taken many forms and is apparent from 
the local to the national level. Many states and localities have expanded 
their programs of routine testing, sometimes as a result of legislation; the 
additional tests are often used as minimum criteria for promotion into 
higher grades or for graduation* Furthermore, average test scores have 
become a common basis of comparisons among schools and districts, and in 
some communities, newspapers routinely publish test results to facilitate 
such comparisons, The ILS, Department of Education hag begun annual 
publication of average college admissions test scores on a state*by-state 
basis, and some states have taken steps to alter their own achievement tests 
to make their results comparable* Test scores have in fact come to be used 
as a national report card, influencing decisions from the level of individual 
students to that of national educational policy. 

In the light of this heightened reliance on achievement tests, a careful 
appraisal of recent trends in test scores has important ramifications for 
educational policy. This paper assesses test score trends among elementary 
and secondary school students; it also discusses the strengths and 
weaknesses of the information they provide* A forthcoming companion 
study, Educational Achievement; Explanations and Implications of Recent 
Trends 9 evaluates common explanations of the trends and explores impli- 
cations for educational policy* 



13 



fiv TRENDS IN EDUCATIONAL ACHIEVEMENT 



April 1080 



THE POLICY CONTEXT OF CURRENT CONCERNS 



Although states and localities bear primary responsibility for elementary 
and secondary education, educational achievement is clearly a national 
concern. Indeed, the current debate has been national in both scope and 
content, It has focused in part on such national issues as the competitive* 
ness of the American economy and national security- -questions that have 
been recurrent themes in debate about educational policy at least since the 
turn of the century. Moreover, the debate has taken hold in all regions of 
the country, and many of the initiatives undertaken by states and localities 
reflect common themes and share common elements^ such as increased 
reliance on achievement testing, As in the past, both the Congress and the 
Administration have been important participants in the debate through 
legislative proposals and the dissemination of information* 



UNDERSTANDING MEASURES OF EDUCATIONAL ACHIEVEMENT 



Although the use of standardized tests as indicators of educational achieve- 
ment has grown sharply in recent years, scores on standardised tests are not 
as straightforward an indicator of achievement as they might first appear* 
For that reason, the strengths and weaknesses of existing tests should be 
kept in mind when interpreting recent trends. 

The advantages of standardized tests- *or, rather, the advantages that 
they can have if carefully constructed --are obvious and important By 
imposing a uniform measure, they can avoid much of the subjectivity and 
extraneous variation that plagues some alternative forms of evaluation, such 
as grade-point averages* Standardized tests can be designed to provide 
valuable comparisons over time and among grade levels, tap specific types 
of skills, and differentiate among students at various achievement levels. 

The weaknesses of standardised tests are less apparent but equally 
significant. In most cases, the tests are not direct and complete measures 
of the skills that are of concern. Rather, they are proxies for this often 
unobtainable ideal. Designing the proxy entails many decisions about the 
test's purpose, content, level of difficulty, format, the severity of time 
pressure, and other factors, As a result, tests vary markedly in what they 
measure and how well they measure it. Indeed, even apparently similar 
tests often produce divergent results. 

Tests designed to assist in selecting students for admission to 
college- -such as the Scholastic Aptitude Test (SAT)* -provide a particularly 
striking example of tests as proxies for other, unobtainable measures, These 
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tests are intended to predict students' performance in college^ which can 
bo measured directly only long after the admissions decision must be made, 
Although these tests comprise multiple-choice questions, their purpose is to 
predict future success on some very different tasks—such as comprehending 
long lectures and writing fluent term papers* -that help determine whether 
students succeed or fail in college. In the case of teits designed to measure 
students' current level of achievement, the contrast between the skills 
embodied in a test and the corresponding skills with which schools are 
concerned is often less striking, but It can nonetheless be substantial. 

Because of these limitations, the results of standardised tests must be 
interpreted cautiously. Trends should be given credence if they appear with 
considerable consistency in numerous tests, particularly if the tests are 
varied. On the other hand f trends that appear only on one test, or only 
among a set of very similar tests, should be considered questionable* 
Moreover, whether trends shown by a test are meaningful hinges on whether 
the characteristics of that test are fitppropriate for the particular issue in 
question* For example, if trends among students in general are at issuei 
college admissions tests can provide dubious information* A large number of 
students never take such tests, which makes the results unrepresentative of 
the student population as a whole, Furthermore, biases are introduced by 
changes in the composition of the group that does take the tests. Similarly! 
some minimum^competeney tests provide little information about trends 
among high-achieving students for want of a sufficient number of difficult 
test items. 



THE DECLINE AND SUBSEQUENT UPTURN 
IN ACHIEVEMENT TEST SCORES 



After years of improvements scores on achievement test scores began a 
sizable drop in the mid*I960s, Tht decline was widespread, occurring among 
many different types of students, on many different tests, in all subject 
areas, in private as well as public schools, and in all parts of the nation. If 

Although the size of the decline varied greatly from one test to 
another, it was in many instances large enough to be of substantial 
educational concern, In general, the decline in test scores was larger in the 



1, A few tests did not conform to this pattern. The National Assessment of Educational 
Progress (NAEP), for example, showed no overall drop in reading since 1970, and the 
American College Testing program (ACT) tests showed no decline in natural science. 
But these exceptions wen few enough, and the conforming tests sufficiently numerous, 
that the generality of the decline is clear. 
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higher grades* Scores on tests administered in grades three and below 
dropped little, if at all, and tests administered in grade four showed only 
inconsistent and small declines, On the other hand, most tests administered 
in grades five and above showed declines in average scores, with the largest 
drops tending to occur at the high school level Among the pchievement 
tests assessed in this study, the average decline in grades six and above was 
large enough that the typical (median) student at the end of the decline 
exhibited the same level of achievement as was shown before the decline by 
students at the 38th percentile, 2/ A different assortment of tests, 
however, would yield a different estimate of the decline's average 
magnitude* 

Although not all skills commonly considered "basic" escaped serious 
deterioration, the score decline appears to have been greater in areas 
involving higher*order skills, For example, between 1972 and 1977, the 
National Assessment of Educational Progress in mathematics showed no 
change in the performance of 17-year-olds in the simple recall of facts and 
definitions, but substantial declines took place on test items tapping deeper 
understanding and problem-solving skills. Items testing arithmetic computa- 
tion showed a mixed pattern; in general, the more complex items evidenced 
the sharpest drops in success rates* This larger drop in higher-level skills 
might be one cause of the greater test score decline in the higher test 
grades* 

The overall decline in test scores generally ended with the cohorts of 
children born around 1962 and 1983-that is, with the cohorts that entered 
school in the late 1960s. Thus, the decline's end first appeared in tests 
administered in the upper elementary grades in the mid-1970s, Thereafter, 
it moved into the higher grades at a rate of roughly a grade per year as 
those birth cohorts aged, reaching the senior high school grades in the late 
1970s (see Summary Figure 1)* This pattern, however, has gained relatively 
little attention* Perhaps because of the greater notice accorded to tests at 
the senior high school level, there has been a widespread misconception that 
the decline ended only within the past few years* 

In factj subsequent cohorts of children-those entering school in the 
late 1960s and throughout the 1970s«produced a sharp rise in scores on 
most, but not all, tests. In the majority of instances in which scores 
increased, the rise has been steady- -with each cohort tending to outscore 
the preceding one- -and often roughly as fast as the decline. As a result, 
achievement in the elementary grades is now by some measures at its 
highest level in three decades* At the other extreme, scores on tests 
administered to high school students, such as the Scholastic Aptitude Test 



2* The average Hiding on these taits wa§ roughly 0,3 standard deviation 
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Summary Figure 1 . 

Iowa Average Test Scores, Grades 5 f 8, and 1 % 
Differences from Post 1964 Low Point 
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Summary Figure 2, 

SAT^Mathematics 
Scores by Ethnicity: 
Black and 

Nonminority Students 




■SOURCE: The College Entrance Examination Board, "Collo* Bowd Data Show Class of '85 Doing Batter on SAT, 
Other Measure of Educational Attainment" form , fuloaso, Tho College Board, September iKJS), 



(SAT), still remain relatively close to their low points of the late 1970s, 
probably because of the shorter interval since scores began to rise again in 
those age groups. While it appears that these improvements are occurring 
at many skill levels, the data raise disturbing questions of whether the 
improvements on some tests have been larger in the more basic skills than in 
areas requiring deeper understanding* 

Another important issue in the debate over educational achievement is 
the performance of minority students on standardized tests, Over the past 
10 to 15 years- -a period that encompassed both declining and improving test 
scores- -the average scores of some minority students rose compared with 
those of nonminority students. The relative gains of black students appear 
on every test for which separate trend data for black students are available. 
Although the gap in average scores between black and nonminority students 
remains large, it has narrowed appreciably (see Summary Figure 2), 3/ Some 



On the SAT, fer example, the rate at which the scores of blaek and nonminority scores 
have converged over the past nine years is comparable to that of the total decline in 
scores among all students taking tha teit-*a trend that few observers have labeled 
insignificant. 
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test results suggest that the scores of black students showed lesser 
decreases than did those of nonminority students during the final years of 
the achievement decline, stopped declining earlier, and showed greater 
improvement during the first years of the overall upturn in scores. 

In addition, Hispanic students, who also typically have average scores 
well below those of nonminority students, showed relative gains over the 
past decade. The improvement appears to have been greater among 
Mexican-American students than among other Hispanics, These patterns are 
less clear-cut P however, because of more limited data, ambiguities in the 
classification of diverse Hispanic students, and the relatively small number 
of Hispanics in the test data. 

The period since 1970 also included relative improvement of average 
test scores in certain characteristically low-achieving types of schools and 
communities. Between 1977 and 1981, mathematics scores on the National 
Assessment of Educational Progress rose much more sharply in high- 
minority schools (those with minority enrollments of 40 percent or more) 
than in other schools* This upturn suggests that the gains of minority 
students cannot be attributed entirely to those attending schools with low 
concentrations of minority students. Students in disadvantaged urban 
schools also showed relative gains in the National Assessments of 
mathematics and reading. In mathematics, for example, average scores of 
9- and 13-year*old students in disadvantaged urban communities rose 
markedly after 1972, while those of students in other localities rose little or 
not at all These relative gains were sizable; by 1981, a fourth to a third of 
the gap in test scores between disadvantaged urban communities and the 
rest of the nation had been overcome, 
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INTRODUCTION 



Concern about the educational achievement of American students has 
recently reached its most serious level since the Sputnik-inspired reform era 
of the 1950s and 1960s, One source of this concern has been a growing 
public awareness that achievement levels had p by many measures^ dropped 
considerably during the 1980s and 1970s, and that American students 
compare poorly on achievement tests with their peers in many other 
nations, 1/ A number of prominent reports- -such as A Nation at Risk** 
have amplified public concerns about the achievement of American students 
and called for major changes in the educational system, 2/ 

The current widespread focus on the educational achievement of 
students is a part of a much broader concern about the state of American 
public education, For example f recent reports have cited such issues as 
apparent declines in the academic qualifications of newly trained teachers; 
growing shortages of teachers, particularly in certain subject areas; a 
perceived failure of educational institutions to keep pace with the demands 
of a technologically changing society; major changes in the charactBristics 
of the school-age population (such as the growing proportion comprising 
ethnic minorities and children from single*parent families); poor school 
discipline; and student abuse of alcohol and other drugs. 

As concern about the state of public education has grown, Americans 
have increasingly come to judge the quality of their schools by the results of 
achievement tests. This trend is apparent from the local to the national 



1. These facts were documented during the 1960s and 1970s, but gained relatively little 
public attention until the past few years. See, for example, Annegret Harnisehfeger 
and David E s Wiley, Achieve ment Test Score Decline: Do We Need to Worryt (Chicago 
ML-GROUP for Policy Studies in Education, 1975); Advisory Panel on the Scholastic 
Aptitude Test Score Decline, On Further Examination (New York: College Entrance 
Examination Board, 1977); Torsten Husen, ed,, International Study of Achievement 
in Mathematics: A Comparison of Twelve Countries (Stockholm and New Yerk^ Almqvist 
& WikiiU and John Wiley & Sons, 1967); and 0, F» Peaker, An Empirical Study of 
Education in Twenty-One Countries: A Technical Report (New York: John Wiley and 
Sons, 1975)* 

2* National Commission on Excellence in Education, A Nation at Ri$k (Washington, D.C.: 
5. Government Printing Office, 1983}* 
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level, In some localities, for example, newspapers routinely publish 
comparisons of the average test scores obtained by students in various 
schools. On the national level, this tendency has taken several forms, 
perhaps the most salient of which r. the now annual publication by the U.S.' 
Department of Education of the average scores on college admissions tests 
attained by students in each of the states. Indeed, test scores have come to 
be used as a national report card on the schools. 

Despite the current emphasis on educational achievement, surprisingly 
little attention has been given to some of the more positive recent trends in 
the achievement of elementary and secondary school students. The declines 
of the 1960s and 1970s ended some time ago (as much as a decade ago in the 
early grades) and have since been superseded by a sizable upturn in test 
scores. This change has only recently begun to gain widespread recognition 
and as yet has had little apparent impact on educational initiatives. 
Similarly, although the large gap in average test scores between nonminority 
and minority students has been widely acknowledged, the fact that this gap 
has been slowly but appreciably narrowing in recent years has gained far less 
attention. 

The current heavy reliance on achievement tests makes it critical to 
gauge recent trends in test scores, to understand the strengths and limita- 
tions of test scores as indicators of educational achievement, and to explore 
their implications for educational policy. This paper assesses recent trends 
in the achievement test scores of American elementary and secondary 
school students. It assesses both aggregate trends and variations among 
groups of students, types of communities, and types of tests. It considers a 
wide variety of tests in order to ascertain the consistencies underlying the 
sizable and often unexplained variation in their results. The analysis §Hws 
that some patterns are reasonably consistent among tests and thert.ore 
warrant confidence, while others are restricted to one or a few tests and 
thus should be considered questionable. A forthcoming companion paper, 
Educational Achievement: Explanations and Implications of Recent 
Trends, evaluates common explanations of the achievement trends and 
explores the implications of the trends and of their explanations for 
educational policy. 



THE CONTEXT OF THE CURRENT CONTROVERSY 



Although states and localities have primary responsibility for public 
elementary and secondary education- -and together provide over 90 percent 
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of the money spent for this purpose by all levels of government* -educa- 
tion is a truly national concern. Debate about educational policy thus often 
emphasizes questions of national interest, For example, although there is 
surprisingly little evidence about the specific skills and abilities that 
contribute to success in different occupations, the impact of education un 
the productivity of the nation's workforce has been an important point of 
debate at least since the turn of the century,!^ Similarly, the implications 
of educational policy for national security have often been the focus of 
attention, Congressional and administration concerns about educational 
achievement accordingly have often been more far reaching than the 
relatively small federal role in elementary and secondary education might 
suggest. 

The current national debate about elementary and secondary educa- 
tion- -and the participation of the Congress and the administration in the 
controversy* -have numerous historical parallels. For example> current 
concern that the most able students be given sufficiently challenging 
curricula has parallels in the 1803 report of the "Committee of Ten"-*con- 
sidered by some historians to be the first major national report on the high 
school,^/ Similarly, contemporary concern that other students be ade- 
quately prepared for the demands they will face after leaving school has 
precursors in another early national report- *The Cardinal Principles of 
Secondary Education^ published in 1018* -as well as in Congressional and 
administration actions around tne time of the First World War, 6/ 

The current wave of concern about educational achievement also 
mirrors its predecessors in having sparked policy initiatives at all levels of 
government. The impact of achievement tests, however, in contrast to less 
specific notions of achievement, has grown much more substantial* Certain 
uses of tests* -for example, minimum*competency tests and other state- 



3, For a description of the technical and economic emphasis in educational debate and 
programs around the turn of the century, see f for example, David K. Cohen and Barbara 
Nsufeld, "The Failure of High Schools and the Progress of Education," Daedalus (Summer 

1981) , vol* 110, pp. 69-81; and Thomas James and David Tyaek, "Learning from Past 
Efforts to Reform the High School," Phi Delta Kappan (February 1983), VGL64, 
pp. 400-406. The relevance of such considerations to federal education policies since 
1917 is discussed briefly below, 

4, James and Tyaek, "Learning from Past Efforts" 

5, Ibid; Carl F, Kaestle and Marshall S, Smith, "Hie Federal Role in Elementary and 
Secondary Education, 1940* 1980 ** Harvard Educational Review, vol, 64 (i) (November 

1982) , pp s 384-408, 
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Figure M , 
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mandated tests-- have grown markedly since the 1970s. Test results now 
have effects that greatly exceed their Impact in earlier eras. These 
consequences are diverse, ranging from the level of individual students to 
that of national policy. They include, for example, decisions about the 
promotion or graduation of individual students; changes in curricula and 
instruction; the distribution of funds among schools; and changes in 
educational policy at both the federal and state levels. 



Trends in the Federal. State, and Local 
Roles in El ementary and Secondary Education 

Funding for and control over elementary and secondary education 
initially a largely local concern. A significant state role began to emen 



was 
ge in 



nineteenth century, however, and has continued to grow since. 6/ At the 



6. Kaestle and Smith, "Ths Federal Rola." 
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end of World War II, the states on average supplied about a third of the 
revenue receipts of public elementary and secondary gehools, while local 
so ^ces provided most of the remainder (see Figure 14). The state share 
continued to increase, although erratically, in the postwar years, and has 
roughly equaled the local share for nearly a decade. 7/ The state share, 
however, varies greatly; in 1982, it ranged from 9 percent in New Hampshire 
to 75 percent in Washington and New Mexico and 78 percent in Alaska* 8/ 

The delineation of state and local responsibilities has also changed 
over time and varies from one state to another, But both states and 
localities have clear reasons to be concerned with achievement trends, since 
they share responsibility for broad questions of curriculum, course 
requirements, and testing, 9/ 

The federal role in elementary and secondary education has always 
been more limited than that of states and localities, Until the end of World 
War II, the federal government contributed less than l f 5 percent of public 
school revenues (see Figure 1-1), The federal share climbed to roughly 4 
percent over the next decade and remained at that level until the mid- 
1960s, when it jumped to a range of 8 percent to 9 percent. It remained at 
that level for about a decade more, From 1977 through 1980, the federal 
share briefly grew to over 9 percent; thereafter it dropped, By the most 
common accounting, the federal contribution in the 1983 school year was 
about $8,7 billion- -just under 7 percent of the $128 billion in total public 
school revenues. 



7, That state and local contributions are currently roughly equal is not a matter of 
controversy, but the precis© federal, state, and local shares shown in Figure 1*1 are 
open to question. These estimates, which are from the National Center for Education 
Statistics, are used because they are perhaps the most common and because they are 
available for a relatively long historical period,* but their use does not represent a 
judgment about the relative validity of the alternatives* Hie Census Bureau's Annual 
Survey of Government Finances yields roughly similar estimates of federal and state 
contributions but a larger estimate of local funding; the state share is estimated to be 
a bit lower than the local, Recent alternative estimates from the National Center for 
Education Statistics show a substantially larger federal share* They do not address 
the split between local and state sources, however, and are available only for recent 
years. See National Center for Education Statistics, Digest of Education Statistics, 
1983*84 (Washington, D.C.: NCES, 1983), Table 62; Bureau of the Census, Finances 
of Public School Sy$tem§ in 1983*84, GF84*No, 10 (Washington, D,C: ILS, Department 
of Commerce, 1985), Table B; and National Center for Education Statistics, Federal 
Support for Education, Fiscal Years 1980 to 1984 (Washington, D.O.i NCES, 1985). 

8* National Center for Education Statistics, The Condition of Education, 1985 Edition 
(Washington, D.C: NCES, 1985), Table 1*10, Hawaii and the District of Columbia, 
both of which comprise only a single school district, are excluded from this comparison, 

9, See, for example, "Changing Course: A 50-State Survey of Reform Measures," Education 
Week, vol 4, number 20 (February 6, 1985), pp. 11*30, 
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The growth in federal funding in pnrt reflected qualitative changes in 
the nature of federal involvement. Until the 1950s, federal education funds 
were devoted to a few very narrow purposes. In i960, for example, federal 
funds supported only three educational programs, two of which focused on 
small portions of the school-age population- -namely, fiscal assistance to 
localities affected by federal installations and the education of native 
American children, Support for vocational education was the sole educa- 
tional program aimed at a broad segment of students. Moreover, in that 
year, over half of federal aid was provided, not for educational programs of 
any sort, but rather for school lunches. 10/ Since then, a variety of laws 
have greatly broadened federal involvement in elementary and secondary 
education. 

Despite the relatively recent expansion of federal involvement in 
elementary and secondary education, however, federal efforts to improve 
the performance of American students date back to the early part of this 
century. Moreover, the rationale for that involvement has often reflected a 
common theme: a national interest in the competence and productivity of 
the labor force produced by the schools, 

The Smith-Hughes Act of 1917, which established federal support for 
vocational education, is often described as the first categorical federal 
program in elementary and secondary education. One of the aims of this 
bill, which remains funded to this day, was to improve the skills and 
productivity of the workforce as a response to international rivalry. tif The 
National Defense Education Act of 1958 (NDEA), which authorized a variety 
of activities designed to improve instruction in mathematics, science, and 
foreign languages, had a similar rationale. 12/ Some historians argue that 
the NDEA had its roots in dissatisfactions with the educational system 
dating back to the early 1950s. But the launching of Sputnik in 1957 and 
heightened concern about America's international stature and competi- 
tiveness clearly added to the NDEA's momentum and shaped debate about 
the act. 13/ Some of the concerns of the Smith-Hughes Act were thus 
mirrored in the NDEA's statement of purpose: 



10, HolHs P, Allen, The Federal Government and Education: The Original and Complete 
Study of Education for the Hoover Commission Task Force on Public Welfare (New York' 
McGraw-Hill, 195Q); cited in Kaestle and Smith, "The Federal Role." 

11, Kaestle and Smith, "The Federal Role," pp. 388 and 891, 

12, Public Law 85-864; 72 Stat. 1580. 



13. Kaestle and Smith, "The Federal Role," p, 393, 
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The Congress hereby finds and declares that tho security of the 
Nation requires the fullest development of the mental resources 
and technical skills of its young men and women* The present 
emergency demands that additional and more adequate 
educational opportunities be made available,.* 14/ 

The large jump in federal funding for elementary and secondary 
education in the mid4980s reflected the passage in 1965 of the Elementary 
and Secondary Education Act (ESEA; Public Law 89*10)* ESEA created a 
broad array of federal education programs, including the compensatory 
education program that remains the largest single source of federal funds 
for public schools, 15/ The statement of purpose of the ESEA noted 
concerns similar to those that motivated Smith^Hugrvss and the NDEA, 
Title I accounted for most of the authorised funds, and the act's statement 
of purpose accordingly focused on an intent to improve the educational 
opportunities open to disadvantaged students, Nonetheless, the statement 
also cited concerns more similar to those of Srnith*Hughes and the 
NDEA - 'the nation's well-being and security, 16 / 

Similar concerns have been voiced again during the past few yearsp 
The report of the National Commission on Excellence in Education, A 
Nation at Risk, asserted that M Qur once unchallenged preeminence in 
commerce* science, and technological innovation is being overtaken by 
competitors throughout the world, This report is concerned with only one of 
the many causes,*^ the problems but it is the one that undergirds American 
prosperity, security, and civility. 1 * 17/ Another prominent critique of the 
educational system, produced by th# "Task Force on Education for Economic 
Growth, 1 ' began by maintaining that improving education is ono of the few 
national efforts that "can legitimately be called* crucial to our national 
survivali" 18/ The Committee Report for the Education for Economic 
Security Act of 1984, which established a new federal effort to improve 



14, Public Law 85*864, Section 1QL 

15, Title I of ESEA, now Chapter 1 of the Education Consolidation and Improvement Act 
ofl98L 

16, Elementary and Secondary Education Act of 1965, H, Kept, 113, 89:1 (1965), 

17, National Commission on Excellence in Education, A Nation at Risk, p. 5, 

18, Taik Few on Education for Economic Growth, Action for Excellence (Denver: Education 
Commission of the States* 1983), p. 3, 
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instruction in mathematics and science, sounded similar themes of nation- 
al prosperity ond security, 19/ 

In addition to these intermittent direct efforts to improve student 
performance, the federal government has also taken on an indirect role in 
this effort by generating, collecting, and disseminating educational infer- 
motion and statistics. Although this rolo has grown substantially in recent 
decades, it extends back for more than a century, and it has generally been 
less controversial than the more direct efforts. The U.S. Department of 
Education was established in 1867 primarily to gather statistics about 
education, and that role has continued without interruption to the 
present. 20/ A National Advisory Committee on Education was established 
in 1954 to advise the Secretary of Health, Education, and Welfare on 
educational studies of national concern, and the National Institute of 
Education was created by the Education Amendments of 1972 (Public Law 
92-318). Other major federal efforts to generate, collect, or disseminate 
information on education accompanied the more direct activities. 

Although these information-related activities receive only a small 
proportion of federal funding for elementary and secondary education, the 
federal contribution provides a great deal-in some cases, the lion's share- 
of resources available for carrying them out. In a number of instances, the 
data generated by the federal government have been unique. For example, all 
of the truly nationally representative indicators of educational achievement 
used in this paper«the National Assessment of Educational Progress, the 
High School and Beyond study, the National Longitudinal Study of the High 
School Seniors Class of 1972, and Project TALENT-were funded by the 
federal government. 



Recent Policy Initiatives 

Numerous recent federal, state, and local efforts to improve educational 
achievement have reflecteu these historical patterns. Many state and local 
governments have made sweeping changes in curricula, high school gradua- 
tion requirements, testing programs, policies for the certification and 



10. Education for Economic Security Act, S, Kept. 98-181, 98: 2 (1984), p. I, 

20. The Department of Education was renamed the Office of Education shortly alter its 
establishment and retained that designation until 1079, 
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compensation of teachers, and other educational policies. 21/ The 
Administration has emphasised its information-dissemination role in 
attempts to prompt reforms* 22/ Some of the legislation considered by the 
Congress (such as the Economic Security Act of 1984) has followed in the 
tradition of Smith-Hughes and the NDEA in focusing efforts on specific 
subjects that were considered by the act's proponents to be of particular 
importance to the nation's competitiveness and security. Other legislation, 
such as the Secondary Schools Basic Skills Acts, would follow in the mold of 
Title! of ESEA in funding additional basia-Fikills instruction for 
educationally disadvantaged students, 28/ 

Trends in educational achievement* -particularly, the decline of the 
1960s and 1970s- -have often been cited as a rationale for recent educa- 
tional initiatives* and some proposals appear to be predicated on assump* 
tSons about the causes of those trends, Many of the recent initiatives, 
however f are not fully consistent with either the trends or the limited 
information on their causes. For example, some of the proposals do not take 
into account the nearly uninterrupted increase in test scores in the earliest 
grades, Others aim primarily at specific curriculum areas- -such as the 
most basic skills* - that have shown relatively favorable trends. 

Congruence with recent achievement trends is of course only one of 
many bases on which to ground educational initiatives, Changing a given 
educational practice, for example, might improve average levels of achieve* 
ment even if— contrary to common view-that practice did not actually 
contribute to the decline* But as long as achievement trends are offered as 
rationales for educational policy changes, the consistency between the 
proposals and the trends is important to evaluate* Moreover, a more 
comprehensive view of the trends and their causes allows one to design 
initiatives to counter the severest problems, to capitalise on recent positive 
trends, and perhaps to target some of the root causes of both* 



21. For eKample, "Changing Couno, A 50-Statc Survey;*' Staff of the National Commission 
on Excellence in Education, Meeting the Challenge: Recent Efforts to Improve Education 
Across the Nation (Washington. D.C.; Department of Education, Nevtmbir 1983)* 

22, For tKample, National Commission on Excilltaca in Edueation, A Naiim at Risk; tLS. 
Department of Education, State Education Statistics: State Performance Outcomes t 
Resource Inputs f and Population Characteristics t 1982 and 1984 (January 1985}; U*5» 
Department of Education, Indicators of Education Status and Trends (January 1985)* 

23* S.5QB, introduced by Senator Bradley, arid H,R* 901, introduced by Hepr#sentativ§ 
Williams. 
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In recent years* the use of standardized tests as indicators of achievement 
has been burgeoning. These tests are diverse* including minimum- 
competency tests (MOTg) f college admissions tests, and "norm-referenced" 
achievement tests, All of them, however t have one common characteristic: 
they apply a uniform measure to gauge the performance of diverse students 
in a wide variety of settings. 

Many advantages of standardised tests over alternative measures** 
such as grade-point averages and locally developed tests^are obvious* On 
the other hand, * /hile the limitations of standardized tests are less obvious, 
they can be severe. If 

Perhaps the most important strength of standardized tests is that they 
can be freed of much of the subjectivity that can plague such alternative 
measures as teachers* grades or class rank, They can also avoid other 
extraneous variations in evaluations of student performance, such as differ- 
ences in grading standards. If appropriately designed and scored, standard- 
ized tests can be made comparable over time and can yield useful 
information about trends that is unavailable from other sources. Standard- 
ized tests can also be designed to provide valid indices of specific aspects of 
achievement. They can be designed, for example, to differentiate among 
particularly high- or low*achi§ving students, tap specific types or levels of 
skills, or provide comparable information on the performance of students in 
different grade levels* 

Despite these strengths, the seemingly straightforward information 
provided by standardized tests often masks considerable complexity and 



1. Although many of the key issues in testing are technically complex, this chapter provides 
a largely nontechnical description for readers who are unfamiliar with testing and 
statistics. Readers desiring a mere detailed and technical discussion of the issues 
discussed in this chapter are referred to "Testing: Concepts, Policy, Practice, and 
Research," a special edition of The American P$yehoiogisi % vol* 36, (October 19SI), and, 
in particular, to Bert Green, "A Primer of Testing," pages 1001-1012 in that volume, 
on which parts of this chapter draw substantially, 
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ambiguity, One indication of the limitation* of standardized tests is the 
often marked disparities in the results they yield (see Chapter III), This 
divergence can reflect differences in the purposes and construction of the 
tests, such as discrepancies in content, level of difficulty, or test format. 
On the other hand, its causes are often poorly understood, and it can also 
appear when tests are apparently similar. 

The limitations of standardized tests are particularly severe when they 
are used to compare schools, districts, states, or other aggregates- -as they 
increasingly have been in recent years, Such comparisons are difficult and 
can be seriously misleading. Standardised measures in themselves can 
remove only some, but not all, of the extraneous variation among groups, 
For example, comparisons among jurisdictions can be seriously biased by 
differences in dropout rates, the composition of the school-age population, 
rules governing exclusion of certain groups from testing, and the closeness 
of the match between the test and curricula. 

Using standardized tests to gauge trends is also especially problem- 
atic. To assess trends accurately, test results must be made comparable 
from one testing to the next This process is more difficult than it might 
seem (as is described below), When test results are not made fully 
comparable, estimates of trends can be seriously distorted, 



EDUCATIONAL TESTS VERSUS EDUCATIONAL ACHIEVEMENT 



Although popular accounts often treat test scores as synonymous with 
educational achievement, the two are in fact very different. In most cases, 
tests are not direct and comprehensive measures of educational achieve* 
ment, Rather, they are proxies, or substitutes, for such ideal but generally 
unobtainable measures, varying markedly in how much they differ from the 
ideal, The choices made in designing that substitution are many and have a 
large impact on the results obtained. 

Perhaps the best way of understanding an educational test is to 
consider it an activity, the performance of which is intended to predict 
some other performance or attribute that is more difficult to measure 
directly, 2/ In some instances, what the test predicts cannot be directly 



Douglas Coulson of the OfTiee of Technology Assessment suggested this metaphor, 
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measured because it lies in the future (such as performance in subsequent 
schooling or work). In other coses, the test is a proxy for a present 
characteristic of the student-such as mathematics achievement-that is 
difficult or impossible to measure completely. 

An example of a test that differs markedly from the activities for 
which it is a proxy is the Scholastic Aptitude Test (SAT), The SAT is 
intended to predict students* performance in college, and much of the work 
gauging that test's value assesses the correlations between SAT scores and 
freshman-year college grades. 3/ Taking the SAT, however, is an activity 
very different from most of those in which colJege students must succeed. 
Those students who do well on a multiple-choice examination are not 
necessarily those who can concentrate through an hour-long lecture, 
discipline themselves to do considerable amounts of reading over a long 
period of time, or write well-organized and fluent term papers. For this 
reason, the SAT predicts college performance only imperfectly. 

While most achievement tests, unlike the SAT, are intended to assess 
the present knowledge or other current attributes of students rather than 
their future performance, striking differences can still exist between the 
activities constituting the test and the real-life skills for which they are 
proxies. For example, many tests use a multiple-choice format, in Twt 
because of ease of scoring. The corresponding tasks in real life, howevor 
often involve quite different skills- -writing prose, solving a matheixi iuj, 
problem without any clue about possible solutions (and even without a clear 
statement of the problem), inferring or hypothesizing explanations of 
events, assessing the logic and persuasiveness of arguments, and so on. 

Given these differences between tests and the corresponding real-life 
activities, creating a test-and understanding the results of one already 
administered-raise several sets of questions: 

o What is the test's purpose, and what real-life skills are of 
interest? 

o What test activities- -at what level of skill and in what 
format- - will be used to represent those real-life skills? 



Hunter M. Breland, Population Validity and College Entrance Measures, Research 
Monograph Number 8 (New York: The College Board, 1979). 



o To what extent is performance on the test actually a reasonable 
gauge of the rcoMifc skills of interest? and 

o How nro the scores scaled and reported? 
IMPORTANT CHARACTERISTICS OP EDUCATIONAL TESTS 



Many characteristics of educational tests have a major impact on the 
results those tests yield. This section describes some of the most important 
test characteristics and illustrates their Impact on test results. 

What Is the Purpose of the Test? 

Most of the commonly discussed educational tests are designed to achieve 
one of three purposes: 

o Ascertain whether students have acquired specific skills or infor- 
mation; 

o Rank students in terms of their knowledge or skills; or 

o Predict subsequent performance, 4/ 

Tests That Ascertain Whether Students Have Acquired Specific Skills or 
Information , Among the tests intended to gauge whether students have 
acquired specific skills or knowledge are the minimum-CQmpetency test$ 
(MCTs) now used by many states and localities as criteria for promotion! 
graduation, or remedial services. The content of these tests generally 
reflects a judgment about the skills and knowledge that most or all students 
should master, and thus the level of difficulty is often deliberately quite 
low, Because tests of this type entail comparing a students performance 
with a concrete criterion for achievement, they are called criterion* 
referenced tests. 



4, Although using test results to compare or rank jurisdietions-sehools, distrkts, and 
states-is currently enjoying a vogue, none of the teste reported in this paper was designed 
for that purpose, The difficulties that arise in using them to that end are discussed later 
in this chapter, 
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How items are typically selected for inclusion in critarion-referenced 
tests has important implications for comparisons among groups of students 
and for the assessment of achievement trends. Whether an item is selected 
depends primarily on the extent to which it represents an aspect of the 
criterion or skills to be taught* For that reason, assuming that the item has 
no other problems (such as ambiguous wording), the proportion of students 
correctly answering it can be irrelevant In the case of MCTs, one might 
find both test items that most students answer correctly and a large number 
of very high scores. These results would reflect the typically low level of 
achievement (the "minimum competency M ) used as a criterion and would 
simply be interpreted as evidence that the schools are successfully 
imparting that particular set of skills* 5/ 

When criterion-referenced tests such as MCTs include many questions 
that most students answer correctly (or incorrectly), comparisons between 
high- and low-achieving students often become very difficult to interpret, 
For example, if the test is relatively easy, high-scoring students will score 
near or at the maximum, Even so, some of their scores will be lower than 
they might otherwise be, since the absence of more difficult items on the 
test leaves no way for the higher-achieving students to distinguish them- 
selves from others. This is often referred to as a ceiling effect; the 
opposite is called a floor effect. 

One result of the ceiling effect in some MCTs is that when scores are 
generally inereasing-as has been the ease with many tests in recent years- 
they will tend to show low-achieving groups as gaining on higher-achieving 
groups^ even when all groups are actually improving comparably. Because of 
the ceiling, the scores of the higher-achieving groups cannot increase 
proportionately to mirror their true improvement 

Tests That Rank Students in Terms of Their Knowledge or Ski Us , In contrast 
to MCTs, those achievement tests that for years were the standard in 
elementary and secondary schools rate students by comparison to the 
performance of other students, rather than by comparison to an absolute 
achievement criterion. For example, a student's performance might be 
reported as being at the 75th percentile, meaning that it exceeded the 
achievement of three-fourths of all students* 



5, A vejy high success rate on an MCT, however, may be taken as a sign that the teat is 
no longer serving its function* since it no longer Indicates skills that need improvement. 
That is, it might call the achievement criterion itself into question. New Jersey, for 
example, recently decided that its MOT needed replacement with a more difficult test 
for this reason. Statewide Testing System, New Jersey Public Schools (Trenton* New 
Jersey Stata Department of Education, January 1983)* 
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The distribution of scores with which students are compared is called 
the "norms," and such tests are therefore called norrn-referenced t The 
norms are typically derived from a national sample of students am' are 
generally revised infrequently-typically, at intervals of seven years or so* 
Revision of the norms-often called ^enorming* -generally entails both 
revision of the test itself and retesting with a new national sample. One 
technique, for example, is to revise the test and then to administer both the 
old and new versions to a large national sample of students, This approach 
provides both a new set of norms and a measure of the extent to which 
changes in scores reflect the revision of the test itself rather than a change 
in achievement. 

Norm-referenced tests are often relatively free of the floor and 
ceiling effects that can plague interpretation of MCTs, Since norm* 
referenced tests are designed to rank students, they typically must be easy 
enough to differentiate among low-achieving students but difficult enough 
to discriminate at the high end of the achievement distribution, 

Performance on norm-referenced tests can be scored in many ways, 
and one common sc&h**$iandard deviationB^ or SBs«is especially 
important in understanding the trends reported in later chapters. The 
reporting of scores in terms of standard deviations allows the comparison of 
trends among many different tests. The distribution of scores on norm- 
referenced tests typically resembles the "normal 1 * or beli-shaped curve-that 
is, many scores are clustered around the average score, while smaller 
numbers of students obtain scores farther from the average (see 
Figure II »!). 6/ When scores are distributed that way, the standard 
deviation is a convenient measure of how far a given student's score is from 
the average. A student scoring 1 standard deviation above the average has 
exceeded the snores of about 84 percent of all students, and a student with a 
score 2 SDs above the average has scored above 97*7 percent of all students. 
(The measure is symmetrical, so that a student scoring 1 SD below the mean 
has exceeded the scores of about 16 percent-400 minus 84*-of all students.) 



6, Test scores generally do not entirely conform to the bell-shaped curve, but the departures 
from the normal curve are often small and relatively unimportant for many purpoies. 
The distribution of SAT scores, for example, typically is a bit flatter near the mean than 
ii the normal curve, as a result of correlations between items on the test, It is also often 
slightly skewed toward the higher end of the scale, although this varies with the subtest 
and particular administration of the test. Finally, SAT scores are bounded at both ends, 
with a minimum of 200 and a maximum of 800, (William Angoff and Gary Marco, 
Educational Testing Service, personal communication, March 1086), 
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Figure IM, 

Hypothetical Test Results Expressed in Standard Deviations (SDs), 
Based on the SAT-Mathematics (SAT-M) 
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SOURCE: Adapted from the 1984-19© 5AT*M seores, National Coffeg&-Bound &§nior$ f ?93§ (New York: 
The Collage Board, tiSSh 

NOTI: The SAT is only approximately normal, although the deviations from normality ore relatively minor for most 
purposes (see the text), 



Tests That Predict Future Performance . A variety of tests-including 
college-admissions tests such as the SAT and the American College Testing 
Program (or ACT) tests«are designed to predict future performance rather 
than to assess current levels or past acquisition of skills. 

The SAT and ACT outwardly resemble the norm-referenced achieve- 
ment tests in many respects, and the trends shown by the two types of tests 
can in some respects be interpreted similarly. Moreover, the distribution of 
scores is nearly "normal/' or bell-shaped, and thus students* scores can be 
expressed in terms of the number of standard deviations from the average. 
Accordingly, they largely avoid ceiling and floor effects. 



IB TRENDS IN EDUCATIONAL ACHIEVEMENT 



April 1986 



Despite their outward similarity to norm^refereneed achievement 
tests, however, college-admissions tests are not necessarily indicators of 
achievement The value of such a test lies in its ability to predict 
performance in college, A student's current level of achievement is only 
one of many attributes that might predict future performance. Alternatives 
might include, for example, general problem solving abilities, attention 
span, or such cognitive measures as fluid intelligence or spatial visualiza- 
tion, Whether a test used to predict college performance relies substan- 
tially on current achievement rather than other attributes thus depends on 
whether one believes- *or can demonstrate— that current achievement is a 
better predictor than are those alternatives, In fact, the SAT is quite 
dissimilar from most achievement tests. The mathematics portion, for 
example, is intended to "depend less on formal knowledge than on reasoning" 
and is deliberately not closely tied to secondary^chool mathematics 
curricula. The College Board has repeatedly protested the misuse of the 
SAT as a measure of the effectiveness of elementary and secondary 
education, 7/ The ACT, on the other hand, in many respects resembles 
achievement tests more closely than does the SAT and is intentionally more 
closely tied to secondary-school curricula, 8/ 



What Skills and Skill Levels Will Be Assessed? 

Once the purpose of a test is decided, decisions must be made about the 
actual test _content»the specific skills and knowledge to be assessed and the 
le- el of difficulty to be targeted, Unless the purpose of a test is extremely 
narrow»for example, testing proficiency in two-digit subtraction problems- 
these decisions are vexing and their solutions ambiguous. For example, 
many diverse skills are subsumed by broad categories such as "reading" or 
"mathematics," even at the elementary school level Test makers must 
choose among these skills and decide the relative emphasis that each of 
those chosen should receive. 



Advisory Panel on the Scholastic Aptitude Test Score Decline, On Further Examination 
(New York: The College Entrant Examination Board, 1977), pp, 3 and S; Statement 
by Dame 1 B, Taylor, Senior Vice President, The College Board, before the Subcommittee 
on Elementary, Secondary, and Vocational Education, Committee on Education and 
Labor, U, S. House of Representatives, January 31, 1084, 

Personal communication, Mark Reekase, American College Testing Program, January 
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Differences in test content and level of difficulty can radically affect 
the results shown by ostensibly similar tests and can even change the 
fundamental conclusions one reaches about the condition of educational 
achievement. For example, the apparent size of the achievement decline of 
the 1960s and 1970s--and even the presence or absence of a decline-varies 
with test content. 

Even once the mix of skills and knowledge to be tested is determined, 
important decisions remain about the context in which the skills are to be 
assessed anrf the test f s level of difficulty, For example, in the area of 
mathematics, ^0 National Assessment of Educational Progress showed that 
the achievement decline of the 1970s was larger in the case of test items 
that embedded arithmetic skills in story problems than in the case of items 
that tested the same skills through simple computational exercises such as 
23 x 45, (Story problems are often seen as requiring higher-level skills-such 
as reasoning-in addition to rote computational skills.) The National Assess- 
ment also found no decline in the 1970s in lower-level reading skills (literal 
comprehension) but some decline in higher-level skills (inferential 
comprehension). 



What Format Is Used? 

Although the impact of test format—for example, multiple*ehoice, fill-in* 
the-blanks, open*ended short-answer, essay, and so on-is not completely 
understood, it is clear that format can affect the mix of skills actually 
tested and thus the results obtained. 

In large-scale assessments, considerations of speed and cost create 
pressure to use a multiple^ehoiee format* Multiple-choice tests can be 
graded quickly and unambiguously, often by machine* In contrast, scoring 
essay examinations can be time consuming, and guaranteeing even partial 
consistency among graders-or even among essays scored by a single 
grader - - can be arduous* 

Unfortunately, multiple-choice tests appear not to measure some 
higher*level skills well, though they can assess certain skills that are often 
referred to as higher level, For example, multiple -choice measures can test 
a student's ability to solve mathematical word problems, which require a 
higher level of skills than those required by simple computational exercises* 
Similarly, multiple -choice items can be designed to require sophisticated 
levels of reasoning, as a perusal of items from the SAT or ACT clearly 
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indicates, Nonetheless, research suggests that it is diffleult-although not 
impossible-to write multiple-choice items that successfully measure cer- 
tain aspects of reasoning, analytic thinking, and problem-solving abilities. 
As a result, performance on multiple-choice questions often depends more 
on factual knowledge and less on these higher-level skills than is 
intended. 9/ 



While this research indicates that multiple-choice tests have 
important limitations, it does not clarify the extent to which the use of such 
tests poses serious problems for the assessment of elementary and secondary 
school achievement. The degree to which the skills tapped by multiple- 
choice tests overlap with the set of skills that schools wish to foster remains 
a matter of debate but presumably varies considerably with subject matter 
and the age and ability level of students. Similarly, whether- -or in what 
circumstances- -the problems of alternative tests outweigh those of 
multiple-ehoice tests is a matter of argument. 



How Well Does the Test Assess What It Is intended to Test? 

Whether achievement testa actually measure what they purport to is an 
underlying theme in the current debate about the proper role of testing. 

Validity. The extent to which a test can be shown to test the skills that it is 
intended to test is called its validity. Simple subjective estimates of a 
test's validity are often misleading, and validity is therefore measured in a 
number of other ways. 

In most eases, tests are validated by comparing performance on the 
test with some other criterion that can serve as a benchmark for the skills 
of interest. Unfortunately, straightforward criteria against which to 
validate achievement tests are rarely available. (If they were, the tests 
would often be superfluous.) For example, standardized tests originated in 
part as a substitute for teachers' judgments, which were deemed too 
subjective. Yet current standardized achievement tests are sometimes in 
part validated~for warn better eriteria--by comparing scores on the tests 
with teachers' grades, r <§ with scores on other similar tests. 10/ 



10. 



More discussion of ssue can be found in Norman Frederiksen, "The Real Test Bias- 
19«4)"^*9S T 202" ° n TMChing a ° d Uattda 8" American Psychologist, vol. 39 (March 

For example, see SRA Achievement Series, Technical Report # 3 (Chicago: Science 
Research Associates, 1981). 
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One particularly important benchmark against which to validate tests 
is the closeness of the fit between the test and the curriculum to which 
students are exposed. This criterion^called currwular validity^hm 
received increasing attention in recent years as a result of the spread of 
minimum-competency testing and the growth of litigation about test use,!!' 
If a test matches the curriculum poorly, it will provide misleading 
information about students* mastery of course material and about the 
effectiveness of teaching. It can also increase the influence that irrelevant 
factors-such as students' socio-economic background-have on scores and, 
in some cases, bias trends, W 

Reliability , Another characteristic of achievement tests that is closely tied 
to validity is the consistency of the scores they yield, which is referred to as 
test reliability. That is 3 if it were possible to administer equivalent 
tests several times, without the learning that would accompany repeated 
experience, how consistent would the results be from one administration of 
the test to the next? A reliable test is one that would show little variation; 
an unreliable test would show more* A test cannot be valid if it is highly 
unreliable, for the scores and rankings produced by an unreliable test largely 
reflect random error rather than the skills that the test purports to 
measure, It does not follow, however, that a test is valid merely because it 
is reliable; it can provide consistent estimates of the wrong thing, A highly 
consistent algebra test i§ not valid as a measure of knowledge of geometry. 



For example, a central issue in Debra P. v$, Turlington*-* suit concerning Florida's use 
of a minimum competency examination as a criterion for high-school graduation-was 
whether the skills and knowledge required by the MOT were actually taught in the 
Florida schools. Bgbra R €i aL t y, Turlington, et al % 474 F,Supp, 244 (U,S. Dist. Cr, Ct, 
Fla* 1979) Affirmed in part/Vacated in part/Remanded 644 F, 2d 397 (5th Gir. Ct s 1981), 

Educators often draw a further distinction between curricular validity and instructional 
validity, The former refers to the correspondence between the test and the content of 
the curriculum materials, while the latter refers to correspondence with what is actually 
taught, (The courts have often spoken of curricular validity even when instructional 
validity was the principal issue.) While this distinction can be important in determining 
the validity of a test, it is not critical here, and both concepts are subsumed under the 
term "curricular validity" in this paper. See Peter W t Afrasian and George F. Madaus, 
''Linking Testing and Instruction: Policy Issues/' Journal of Educational Measurement 
vol, 20 (Summer 1983), pp, 103*118, 

For example, changes in curricular validity might underlie the fact that the ACT 
mathematics test results have not shown the sharp upturn that the SAT mathematics 
test results have shown in the past several years. Unlike the SAT, the ACT is intended 
to reflect the high school curriculum. One-fifth of toe ACT mathematics test comprises 
geometry items, and a decline in the teaching of geometry as a distinct subject might 
be depressing scores, preventing an upturn like that of the SAT, (Personal 
communication, Mark Reckase, American College Testing Program, January 1985 J 
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Reliability is increased by repeated measurements* For example, a 
single measurement using an erratic thermometer would inspire little 
confidence, for a second reading might be very different The average of 
many readings, however, would inspire more confidence, since the random 
errors would tend to be canceled out Similarly, multiple measures of 
achievement are generally more reliable than a single measure. Indeed, 
adding additional information on a student's achievement will sometimes 
increase the reliability of the resulting conclusion even if the new informa- 
tion is itself less reliable than the old. For example, adding information 
about teachers' assessments of students to scores on a standardised test will 
sometimes increase the reliability of the conclusion even if the teachers' 
assessments are somewhat less reliable than the test, 13/ 

All tests entail some unreliability, but that is generally not a problem 
when considering trends or comparison between groups, since the errors of 
measurement tend to cancel each other out when scores of many students 
are averaged* It can be a serious problem, however, when test scores are 
used to make decisions about individual students. Some of those decisions 
will invariably be incorrect if single tests are used as the basis for judgment. 
For example, consider a hypothetical requirement that students score above 
the average (475) on the SAT-mathematics to graduate from high school 
About one-sixth of all students with "true" scores of 508 would obtain failing 
grades on any one administration of the test, as would about a third of 
students with true scores of 490.14/ The SAT is widely considered to be a 
very well-constructed test, and the error rate using many other tests would 
likely be far higher. 



How Are the Scores Scaled and Reported? 

The scaling of test scores, and the form in which they are reported, can 
dramatically affect the results obtained, particularly when comparisons 
between groups or trends over time are of interest. Unfortunately, the ways 
of scaling and reporting scores that seem the most straightforward are often 
especially misleading. 



13, Whether adding information from a less reliable measure increases or decreases 
reliability depends on the correlation between the various measures as well as the 
reliability of each. Adding information from a measure that is highly unreliable and 
largely uncorrected with the original measure is more likely to reduce the reliability 
of the composite measure, Adding information from a measure that is nearly as reliable 
as the original and that is highly correlated with it is more likely to increase reliability. 

14, These calculations are based on a standard error of estimate of 34 points. Solomon 
Arbeiter, Proflhs: Colltgt*Bound Seniors, 2984 (New York: The College Board, 1984), 
p.iii. 
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One of the simplest methods of scoring tests is to express the scores 
m the percentage of items correctly answered* without regard for the 
relative difficulty of different Items, This method is the standard in many 
classroom testa and was also the primary method of reporting results of the 
National Assessment of Educational Progress until recently. 

Despite their outward simplicity* percentage^correct scores say rela- 
tively little about an individual's achievement and even less about the 
differences between individuals or groups, For example, what level of 
achievement would be indicated by a score of 60 percent correct on the 
National Assessment mathematics test? Is an improvement of 20 percent- 
age points from that level comparable in significance to a decline of 20 
percentage points? Lacking information about the level of difficulty of the 
items answered correctly or about the distribution of scores among students, 
these questions cannot be answered. 

The most common solution to this problem is to translate scorer* into 
an alternative, comparative form that indicates where one student's score 
falls relative to all others. One common form is standard deviations, 
described earlier In this chapter. Another is percentiles, For example, the 
score of a student whose performance exceeded that of three*fourths of all 
others would be reported as being at the 75th percentile. Yet another, less 
commonly used now than in the past, is the "grade* equivalent score/ 1 In this 
scale, each student's score is expressed as the grade (often, year and month) 
of school in which the typical student attains a comparable score. 

None of these scaling methods provides an unambiguous estimate of 
achievement differences between individual students or groups of students, 
but they can yield enough information to be useful A comparative scale 
can indicate, for example* the percentile ranking that the average student in 
one ethnic group would attain if compared with students in another* It 
would not indicate, however, the relative amounts of skills and knowledge 
gained by typical students in both groups* A simple percent-correct 
measure provides less information, One can calculate, for eKample, the 
proportional difference between the average percent of correct answers in 
two ethnic groups (as has been done in Chapter 4 with the National 
Assessment data), but the meaning of those differences is unclear* 

When comparing trends over time in different groups, the ambiguity of 
all of the scales becomes more serious, For example, consider a situation in 
which both low-achieving and high-achieving students appear to be gaining 
over time on a percentage-correct measure, but low achievers appear to be 
gaining faster, (A pattern of this sort appeared during part of the 1970s in 
some of the National Assessments,) For simplicity, say that the average 
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studohc in the low-scoring group went from having 20 percent to 40 
percent correct answers, while tho score of tho average student in the high- 
achieving group increased from 80 percent to 00 percent. Without furthor 
information (such as the content and difficulty of the additional items each 
group answered correctly and the mix of items in the test), it is not obvious 
that the improvement in the lower group really reflects a greater achieve- 
ment gain, For example, the improvement in the lower group might reflect 
a moderate increase in the proportion of many simple arithmetic items 
answered correctly, while the ostensibly smaller improvement in the higher 
group might reflect a sharp increase in the proportion of a few difficult 
algebra problems answered correctly, Information akin to this is rarely 
available from published sources, but even when it is, deciding which 
improvement is greater requires a subjective judgment, 16/ 

The use of comparative measures lessens these ambiguities, but it does 
not eliminate them. By using a comparative measure-such as standard 
deviations-one can ascertain which group changed more relative to the 
distribution of scores. Two ambiguities remain, however, First, the 
substantive meaning of a change from^ say, 0 to 04 standard deviations 
(SDs) above the average might be quite different than that of an increase 
from 1,0 to LI SDs above the average, On a mathematics tost, for 
example, the first change might reflect improvements in computational 
abilities, while the second one reflected improvement in solving multi-step, 
multi-operation word problems, Second, different comparative measures 
can yield inconsistent answers, For example, relative trends expressed in 
SDs can be different from changes expressed in percentiles, In the previous 
example, an increase from 0 to 0,1 SDs above the average corresponds to an 
increase from the 60th to the 64th percentile, while the increase from 1,0 to 
LI SDs above the mean-* equivalent in terms of SDs* * corresponds only to 
an increase from the 84th to the 86th percentile, Which of these measures 
is more meaningful is a matter of debate and depends in part on the question 
being addressed, 



USING TESTS TO GAUGE TRENDS OR COMPARE JURISDICTIONS 



The characteristics of the tests themselves are important in determining the 
results of achievement tests, But when tests are used to compare 



15, The compression of high and low scores by percent-correct measures exacerbates this 
ambiguity, For example, in this instance, the high-achieving group could never show 
an improvement larger (in terms of simple differences) than that of the low^aehieving 
group, for that would require scores above 100 percent correct. 
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juriFKjictions (schools, districts, or state*) or to gauge trends, several othor 
vmMvmtwm also became critical These factors, while diverse, reflect a 
sirgle underlying problem. In each ease, the difficulty is that extraneous 
variation in test scores (for examples that reflecting disparities in students' 
backgrounds) is confounded with relevant variation (such lis that attributable 
to differences in school effectiveness). 



Differences in the Composition of the Tested Groups 

Disparities in average test scores among jurisdictions need not indicate 
differences in the achievement of comparable students or, by implication, 
differences in the effectiveness of educational programs, Average test 
scores can differ, in some cases dramatically, because of disparities in the 
makeup of the groups of students tested. These compositional differences 
can have several sources. 

One of the most important of these is differences in the ethnic 
composition of the student population, The gap in average scores between 
some ethnic groups tends to be very large, so even relatively small 
differences in ethnic composition can have a major impact on average 
scores* Moreover, differences in ethnic composition are often groat. For 
example, the minority enrollments of the states varied !n 1980 from 1 
percent or less in Vermont and Maine to 67 percent in New Mexico, 75 
percent in Hawaii, and 96 percent in the District of Columbia, Similarly, a 
1982 survey of nearly 90 large school districts found minority enrollments 
ranging from over 90 percent in the District of Columbia, Atlanta, and 
Newark to 5 percent in Cobb County, Georgia, and Jordan County, Utah, 16/ 

Differences in dropout rates are another important source of eompo* 
sitional differences in the higher grades. Because dropouts tend to be low 
achievers, higher dropout rates will elevate a jurisdiction's average test 
scores. 

Various educational policies also contribute to differences in the 
composition of tested groups. For example, rules governing the testing of 
handicapped students, the testing of students with limited proficiency in 
English, promotion from one grade to the next, mid the testing of out-of- 
grade students can all have a substantial effect on average test scores, 



16, CBO calculations bastd on data from the Office of Civil Rights, U,5, Department of 
Education, 
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All of these factors can bias trends as well ns comparisons among 
jurisdictions in any one year, For example* districts experiencing ntypically 
rapid growth in the shore of their enrollments comprising eertoin minority 
groups would be likely to show less favorable tronds than would others. 
Similarly* jurisdictions adapting particularly inclusive testing policies or 
finding successful methods to combat dropping out could make fchair 
achievement trends appear Jess favorable than they otherwise would, 17/ 



How Are the Tests Made Comparable from Year to Year? 

When trends in achievement are a concern ( the methods used to make a test 
substantively comparable from year to year become critical in interpreting 
the results obtained, The simplest method of maintaining comparability 
over time is to keep the test the same. That is often unacceptable, 
however, for a number of reasons. Students and teachers might learn the 
content of a test, thereby artificially inflating scores *and lowering the 
test's validity— over time, Curricular changes might call for alteration of 
test content, and changes in student characteristics and performance might 
necessitate revision of test norms. 

Faced with these problems, most test producers modify tests period- 
ically and establish a new set of norms for the revised form, Scores on the 
revised test, however, need not be similar to those that the same students 
would receive if administered the old form, 

In order to permit comparisons of the results of the old and revised 
forms, most test producers then estimate a mathematical relationship 
between the scores yielded by both versions. This process, called equate 
ingt can be done in several ways. The most straightforward is to 
administer both forms of the test to a single sample of students. In that 
case, differences in the scores yielded by the two versions must reflect 
changes in the test, and the scoring of the revised version can be adjusted to 
compensate, so that each student's score on the revised version is roughly 
that obtained on the old version, 18/ Another method requires including in 
the revised form a set of items from the old test One can then administer 



17, The Impaet of several compositional changes-such as changes In the self-selection of 
students to take college-admissions tests and irmds in drop-out rates*-on meat 
achievement trends is assessed in Congressional Budget Office, Educational 
Achievement: Explanations and Implications of Recent Trends (forthcoming), 

18, Because tests are not perfectly reliable, the scores obtained by an individual student 
on the two versions would not typically be identical even after this adjustment Equating 
can remove much of the systematic change in scores attributable to revisions of the 
test f but other variation in students 1 scores remains. 
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the revised form to a sample of students and compare thoir scores on the 
new test as a whole with their scores on the* shared item*. If the 
relationship between performance on the shared items and scores on the old 
test in its entirety is understood, students* scores on the set of shared items 
can act as a proxy for the scores they would have received on the old test 

Annually Equated Tests . Annually equated tests are by far the most 
valuable in assessing achievement trends. When a test is equated every 
year, any given score reflects a comparable level of achievement in each 
year, and changes in scores can confidently be considered as differences in 
achievement. These differences, however, can reflect changes in the 
characteristics of the studonts tested as well as differences in the amount 
achieved by students of any given type. 

Equating is a burdensome activity, and therefore very few tests are 
equated annually. In the absence of annual equating, interpretation of 
achievement trends is risky, although how risky depends on a variety of 
other aspects of the test. Accordingly, four tests that are annually equated 
— the SAT, the ACT, and the Iowa series of the Iowa Test of Basic Skills and 
the Iowa Test of Educational Development-are given particular attention in 
the analysis of trends in the following chapters, 

Periodically Equated Tests , The periodic renorming of norm-referenced 
elementary and secondary achievement tests is the most common alterna- 
tive to annual equating among tests that are formally equated at all But it 
creates trend data that must be interpreted somewhat differently than are 
the data from annually equated tests. 

Norm-referenced tests are typically renermed once every seven years 
or so, when new forms of the test are administered to national samples 
created by the tests 1 publishers. The resulting norms are used as a standard 
of comparison by schools that use the test for the following seven years or 
so. Publishers frequently equate the norming sample scores. This creates 
two types of information on trends: comparisons of norming-sample scores 
themselves, and annual comparisons of the scores obtained by districts a" A 
states using the test. 

When test publishers equate the norming sample scores, comparisons 
of those scores can provide useful information on changes in achievement 
over the seven or so years between normings. Because each norming sample 
is intended to represent the national test-taking group at that time, the 
changes in the norms yielded by each sample in part reflect changes in the 
composition of the test-taking groups, The equating of norming sample 
scores, however, provides trend data that are in theory independent of 
changes in student characteristics. 
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These comparisons have two important limitations, however* First, 
because there are no comparable data from the years between normmgs, 
comparisons of norming sample scores cm be misleading when achievement 
trends change over that interval. For example, if achievement was 
declining at the time of one norming but began increasing midway between 
then and the next norming, a comparison of the two norming samples might 
show no change at all-a pattern that would be entirely misleading unless 
annual data were available as a clue about trends in the intervening years, 
Second, in recent years questions have been raised about the adequacy of 
the publisher's national samples and changes in those samples over time 
stemming from changes in districts' willingness to participate in themiS^ 
Both nonrepresentativeness of norming samples and changes in their charac- 
teristics could substantially bias analysis of trends, 

The annual, state- or district-wide data obtained from tests that are 
periodically renormed have a different set of advantages and disadvantages, 
During the period between normings-that is, while a single set of norms is 
used as the standard of comparison-these data provide a fairly good 
indicator of trends in the particular jurisdiction, except that growing 
familiarity with the test sometimes artificially increases scores or partially 
masks a decrease, 20/ These trends, however, are confounded with changes 
in the composition of the test-taking group in the jurisdiction taking the 
test, On the other hand, during years of transition to a new set of norms, 
this system can produce serious distortions of achievement trends. 21/ For 



19, For example, Roger F, Baglin, "Does Nationally 1 Normed Really Mean Nationally? 11 
Journal of Educational Measurement, voL IS (Summer 1981), pp. 0? - 108, 

20. Perianal communication, Gene Guest, California Test Bureau of McGraw-Hill, December 



This distortion appears to have occurred, for example, in the Virginia statewide 
assessment, where adopting a new test form and set of norms produced sizable changes 
in scores in some subject areas that were sot predicted on the basis of the national 
norming data, S, John Davis & R, L, Boyer, Memorandum to Division Superintendents: 
Spring 1962 BRA Test Remits (Richmond; Virginia State Department of Education 
July 19, 1982), 

Periodically equated tests can tlso produce spurious changes when attempting to gauge 
a jurisdiction's level of achievement relative to the nation as a whole. For example, 
in a period when achievement is generally going up-as has been the case recently- 
most districts or statei will see their scores rising relative to the old norms. This rise 
does not necessarily Indicate that they are truly improving relative to the nation as 
a whole, but merely that the old norm? are out of date. These jurisdictions are improving 
relative to what the national level of achievement used to be, but they could be improving 
either faster or slower than the nation as a whole. 
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this reason, the following chapters cite annual data from periodically 
normcd teste only for the periods that a single sot of norms was used. 

Tests That Are Not Eq uated , Finally, some of the tests that have been used 
to illustrate recent achievement trends are not formally equated at all. The 
most important of these is the National Assessment of Educational Progress 
(NAEP), which was not equated until the most recent assessment of 
reading, 22/ The absence of formal equating raises the level of uncertainty 
in any analysis of trends* 

In the case of the NAEP, until recently the alternative to formal 
equating was to repeat a sizable proportion of the test items in subsequent 
assessments. Familiarity with test items is presumably not a problem in this 
case for a number of reasons: the test is administered only to a sample of 
children; it is administered only once every several years; and each student 
takes only a portion of the total test. Nonetheless, the procedure creates 
uncertainty, The method of assessing trends hag most often been to 
compare adjacent assessments only in terms of the items shared by those 
assessments. The extent to which those items are representative, however, 
is open to question. Moreover, in at least one instance, the number of items 
shared over three assessments was so small that two different sets of items 
had to be used for the middle assessment-one for comparison to the earlier 
assessment (containing all items shared with that assessment), and another 
for comparison to the subsequent assessment 23/ This might have biased 
the assessment of trends. 

Differences in Curricular Validity 

Both analysis of trends and comparisons among jurisdictions can also be 
distorted by differences in curricular validity-that is, in the fit between a 
test and the curriculum. In both cases, the distortion is the same: groups 
for which curricular validity is lower will score comparatively lower than 
others, even if their actual level of achievement is similar. Typically, one 
might expect this problem to be less tractable when the domain of 
achievement being examined is complex than when it is narrow and simple. 
Devising a test of two-digit subtraction that has roughly comparable validity 
among districts, for example, might be much more feasible than designing 



22, The most recent (1983) NAEP reading test was equated with all previous NAEP reading 
assessments (1970, 1874, and 1970). 

23, National Aisessmeot of Educational Pr ogress, Three National Assessments of Science: 
Changes in Achievement, 1969*7? (Denver; NAEP/Education Commission of the States* 
1978). 
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one in the area of intermediate algebra, which is broader and confronts 
designers of both curricula and tests with n wider array of choices. 

The effects of curricular validity can be particularly vexing in 
assessing trends for another reason. When schools change the mix of skills 
they teach, there is no unambiguous way of equating tests over time unless 
some other criterion of achievement-independent of the schools* goals and 
curricula-is used as the basis for testing. For example, consider a situation 
in which an elementary school adds metric measurements to its mathe- 
matics curriculum, while eliminating the manual calculation of square roots. 
If a test that had high curricular validity before the change in curriculum is 
continued after the change, scores will decrease since students will more 
often fail to answer items about square roots, and there will be no items to 
compensate by testing their new knowledge of metric measures. 24/ 

One alternative is to change the tests to mirror changes in curriculum. 
If that is done, however, it is not obvious what levels of achievement are 
truly comparable among tests, Is proficiency in set terminology (a major 
addition to the mathematics curriculum during the years of the ,f new math") 
equivalent to facility in arithmetic computation (a mainstay of the "old" 
math)? While methods have been devised to estimate whether the items in 
the two domains are of comparable difficulty in a specific population, the 
question of whether these substantively different skills are "comparable** 
remains subjective. In addition, since changes in curriculum are generally 
only partly known, the question of whether the new and old tests have 
similar levels of curricular validity will remain in some doubt. 



24, The affects of even relatively small changes in test content can be substantial, as is 
suggested by the recent experience of the statewide assessment program in Nevada, 
where changing to a revised form of the game nornvreferenced test altered the ranking 
of districts in terms of average scores. This change in the districts* performance, however, 
might alio reflect changes in test characteristics other than content-such as changes 
in format. (George Barnes, evaluation consultant, Nevada State Department of 
Education, personal communication, January 1985.) 
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CHAPTER m 



AGGREGATE TRENDS IN 
EDUCATIONAL AGHIE^mInT 



Over the past several years, bad news has predominated in the public 
debate about educational achievement in the United States, Such 
developments as the decline in achievement that began in the 1960s> the 
unexceptional performance of American students relative to their counter- 
parts in some other countries! and> most recently, the large gap In average 
achievement scores between black and white students have garnered wide* 
spread attention and have generated considerable concern. Less well known 
are some positive trends. For example s average achievement stopped 
declining some time ago and, by many measures, is rebounding sharply, and 
the gap between white and black students, while still large, has been 
shrinking* 



THE DECLINE IN ACHIEVEMENT 



Although not all indicators of educational achievement showed large de* 
clines over the past two decades* the great majority did, leaving no 
question that the decline was real and not an artifact of specific tests. The 
decline was widespread, appearing among many types of students, on many 
different types of tests, in many subject areas, and in all parts of the 
nation, Moreover, in many instances, the decline was large enough to be of 
serious educational concern,!/ Average scores declined markedly, for 
example, on the following achievement measures: 2/ 



1. The pervasiveness and magnitude of the decline were discussed in a number of earlier 
reviews. The breadth and size of the subsequent upturn in achievement* however, has 
net been previously assessed. Most of the early reviews were published before the 
characteristics of the upturn, or even its existence, were apparent* For earlier reviews 
of the decline, gee especially Annegret Harnischfeger and David E, Wiley, Aehie vement 
Test Score Decline: Do We Need to Wero<f(Ghicago: ML^GROUP for Policy Studies in 
Education, 1975); also, Anne T, Cleary and Sam McCandless, Summary of Score Changes 
(in Other Tests) (New York: College Entrance Examination Board, 1970); and Brian 
K. Waters, The Test Score Decline: A Review and Annotated Bibliography (Technical 
Memorandum 81 >2) (Washington, D.C: Directorate for Accession Policy, Department 
of Defense, August 1981), 

2, See Appendix A for explanation of the principal data sources used in this paper. 
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o College-admissions tests— the Scholastic Aptitude Teat (SAT) and 
the American College Tysting Program tests (ACT); 

o Most tests In the National Assessment of Educational Progress 
(NAEP); 

o Comparisons of periodic large representative samples of students 
-Project TALENT, the National Longitudinal Survey of the High 
School Class of 1972 (NLS), and the High School and Beyond (HSB) 
survey; 

o Periodic norming data from commercial standardised tests of 
elementary and secondary achievement; 

o The annual Iowa assessment of student achievement (which pro- 
vides some of the most comprehensive and useful information on 

elementary and secondary achievement trends); 3/ and 

o A number of other state-level assessments of achievement 

On the other hand, a variety of achievement tests did not show large 
declines, In some cases, the exceptions were consistent over a number of 
tests, while in others, they appeared to be simply idiosyncratic, The most 
consistent exception was tests administered to children in the early elemen- 
tary cchool grades. Among fourth-grade students, for example, declines 
appeared only inconsistently and were generally small. Moreover, there was 
apparently no substantial decline at all at even younger ages* -by one 
measure, for example, third*grade scores showed a large, three-decade 
increase interrupted only by a brief pause and trivial decline in the 1960s 
and early 1970s. A variety of other tests-for example, the ACT natural 
science test-also showed only small declines or no decline at all. These 
exceptions, however, were so few that they do not call the overall decline 
into question, 



When Did the Decline Begin and End? 

The beginning of the achievement decline and its end showed markedly 
different patterns. To clarify the difference, it is helpful to distinguish 
between three patterns: "period effects," "cohort effects, 11 and "age 
effects," In practice, a mixture of these three patterns is often found in 
achievement data. 



3. The Iowa data art unique in providing annually aquatad data extending over many 
years, in many subject areas, and in all grades from 3 through 12 (see Appendix A), 
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A period 4 effect refers to a change that oecuri In a specific time 
period, such as a decline in test scores that starts in roughly the same year 
among students of different ages or grade levels (see Figure III-l), In 
contrast* a cohort effect Is a change that occurs with a specific birth 
cohort, An example would be a decline in scores that began with a 
particular birth cohort, appearing first in an early grade and then moving 
into the higher grades at a rate of roughly a grade per year as that birth 
cohort aged (see Figure 111*1), 

An age effect is a change that is linked to the age of those 
tested- -perhaps occurring only in one age group, or varying in size from one 
age group to another. Age effects can occur with either cohort or period 
effects and, when data are incomplete, it can ho impossible to disentangle 
them fully, For example, test scores have been rising in recent years. Thoy 
started rising more recently in the higher grades, however, and to d .vie have 
shown a smaller total increase in those grades than in the lower grades, 
This pattern could result entirely from the fact that scores in the higher 
grades have had fewer years to rise- -that is, fewer of the cohorts 
contributing to the rise in scores have a§ yet reached the higher grades. In 
that case- -a pure cohort effect^scores in the higher grades would be 
expected to continue rising in the near future as more of those cohorts pass 
through the higher grades (see Figure III -1). Alternatively, the pattern 
might reflect an age effect as well. Perhaps the lessee gains in the higher 
grades truly reflect less progress in those grades, as well as the later start 
of the upturn. This pattern might take the form of some cohorts not 
showing progress in the higher grades over the next few years comparable to 
that which they produced when in the lower grades (see Figure III- 1), 

Very little information is available about the onset of the decline. 
Such information as there is suggests** albeit weakly- -that the decline was 
a period effect, beginning relatively concurrently across a range of ages or 
grades, In contrast, the end of the decline- -about which more data are 
available- -shows a fairly clear cohort effect, occurring with a few specific 
cohorts of children and moving up through the grades as those cohorts 
passed through school. 4/ On the other hand, given variation from test to 



4, The period and cohort affects-if they are not an artifact of inadequate information 
**have substantial implications for the interpretation of the decline, Some observers 
have argued that period effects may be more consistent with the effects of changes in 
schooling f while cohort effects tend to suggest changes in student characteristics. See, 
for example, Christopher Jeaeks, "Declining Test Scorer An Assessment of Six 
Alternative Explanations," Sociological Spectrum, Premier Issue (December, 1980) t 
pp v 1-1S. This issue is discussed further in Congressional Budget Office, Educational 
Achievement* Explanations and Implications of Recent Trends (forthcoming). 
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test and the paucity of data r the possibility remains that one or tho other of these 
patterns— particularly, the period pattern shown by the onset of the decline;- -is 
merely a reflection of incomplete information, 5/ 

The few data sources that indicate the onset of the decline place it between 
the 1968 and 1968 school years (see Table III- 1), The variation in the year of 
onset shows no obvious pattern from one test to another. The SAT began to 
decline in the 1063 school year. 6/ The decline in the ACT appears to have begun 
a few years later, in mld^decade*?/ Scores in the Iowa statewide 
assessment- -the Iowa Tests of Basic Skills (ITBS) through grade 8, and the 
Iowa Tests of Educational Development (1TED) in grades 9 and above- -began 
dropping in every grade from 5 through 12 between 1986 and 1988.8/ The 
Minnesota Scholastic Aptitude Test- -a test independent of the College Board p s 
SAT which was administered to high school juniors in Minnesota until the 
1970s— began declining in 1987 after nearly a decade of uninterrupted 
increase. 9/ 



5- Only tests that provide annual or nearly annual data can be used to pinpoint the 
beginning and end of the decline. Many of the major data sources-such as the 
NAEP* -have too great an interval between comparable tests to be useful in this regard. 

Uncertainty about the timing of the decline's onset is heightened by the fact that the 
early decline on two of the four tests that can be Used to pinpoint the onset* -the SAT 
and ACT- - was in substantial part a reflection of changes in the composition of the groups 
taking the tests. If there had been no such compositional changes, the timing of the 
decline on those tests might have been different. 

6. Hunter M, Breland, The SAT Score Decline: A Summary of Related Research (New 
York: The College Board, 19?6). 

7. L, A, Munday, Declining Admiisions Test Scorn (Iowa City: The American College 
Testing Program, 1976), Scores on the ACT mathematics and social studies tests had 
already begun declining between 1964 and 1965-the first years of available data-but 
the decline was very small in the first year. The decline did not begin on the English 
test until 1966. 

8. M Mean ITED Scores by Grade and Subtest for the State of Iowa: 1962 -Present," and 
Iowa Basle Skills Testing Program, Achievement Trends in Iowa-'* 1958-1986 (Iowa 
Testing Programs: unpublished tabulations, 1984 and 1985). 

9. Harnischfeger and Wiley, AeMevemMnt Test Score Decline* 
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TABLE HM. ONSET AND END OP TOE ACHIEVEMENT DECLINE, 
SELECTED TESTS 



Onset End 





Test 


Birth 


Test 


Birth 


Test 


Year 


Year 


Year 


Year 


SAT 


1963 


1946 


1979 


1962 


ACT Composite 


1966 


1949 


1975 


1968 


ITBS Grade 5 


1966 


1966 


1974 


1964 


ITBS Grade 8 


1966 


1963 


1976 


1963 


ITED Grade 12 


1968 


1951 


1979 


1962 


Minnesota Scholastic 










Aptitude Test 


1967 


1951 


N.A. 


N.A, 



SOURCES: Hunter M. Breland, The SAT Score Decline: A Summary of Related Research 
(New York: The College Board, 1076), Table i; National College-Bound 
Seniors, 1985 (New York; The College Board, 1085); L, A, Munday, Declining 
Admissions Test Scores (Iowa City: American College Testing Program, 1976), 
Table 3; National Trend Data for Students Who Take the ACT Assessment 
(Iowa City: American College Testing Program, undated); Iowa Testing 
Programs, M Mean ITED Scores by Grade and Subtest for the State of Iowa: 
1982*Presant M and "Iowa Basic Skills Testing Fragrant Achievement Trends 
In Iowa: 1955-1985 (unpublished and undated); and Annegret Harnisehfeger 
and David E, Wiley, Achievement Test Score Decline; Do We Need to Worry f 
(Chicago: ML = GROUP for Policy Studies in Education, 1975), 

NOTE: N*A, designates not available, 



The end of the decline (which can be ascertained with somewhat 
greater certainty because of more plentiful data) generally occurred within 
a few years of the birth cohorts of 1982 and 1963-*that is, with the cohorts 
that entered school in 1968 and 1969, Thus, the low point in most 
achievement data occurred first in the lowest grades, moving into higher 
grades at a rate of roughly one grade per year as the cohorts of 1962 and 
1963 passed through school. 

This cohort pattern, which was first noted by those working with the 
Iowa tests (the ITBS and ITED), also occurs in a wide variety of other test 
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figure 111-2. 

ITBS Composite Scores, Iowa Only, by Test Year 
and Grade at Testing 
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SOURCi: li iow$ mnk Skill* Turing Pti>qmm t Ach™mf!fi! Trends in fewa; 19S5- 1^' (iowa Teeling Programs, 
unpublished ijnd undated materiaU. 



series* 10/ The progression through the pedes is somewhat erratie«perhaps 
because of various unexplained year-to-year fluctuations in average 
scores* -and is therefore not always apparent from a comparison of a few 
adjacent grades from a single test The pattern becomes clearer, however, 
when a range of grades and tests are considered* Thus, the decline generally 
ended in the upper elementary grades in the mid-1970s, when the cohorts 
born within a few years of 1962 reached the ages of 10 and 11 (see Figure 
111*2), The decline in junior*high achievement ended a few years later. 
Tests given primarily to high school seniors (such as the SAT and the grade 
12 ITED) stopped declining around the 1979 school year, when the birth 
cohort of 1062 was the appropriate age (see Figures 111*3 and III-4)Ji/ 



Leonard Ft Idt, of %hn Iowa Testing Programs, the University of Iowa, pointed out the 
cohort patters la the Iowa data (personal communication, December 1983), 

This cohort pattern Is particularly appar tat in the Iowa data because they include annual 
Information from all grade levels above grade three* In many other cases, tht pattern 
becomes apparent only by comparing the timing of the decline's end among a variety 
of teste administered in different grade levels. See Appendix B. 



11, 



One salient exception to this pattern is the ACT, which reached its low point a few years 
earlier* 
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Figure III 3, 

ITED Composite Scores, Iowa Only, by Test Year 
and Grade at Testing 
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SOURCE: M Mean iTED Test Scores by Grmte «nd Subtest for iho Staler of Iowa: \M2 to Present" flows Testing 
Pfoyfartm, undated and unpublished tabutatioh*) . 




SOURCES: CBO calculations bas#d or Muottr M, Braland, Th$ SAT Sew* Qettim* A Summary of Reared 

ftHearch (rVtw Yofkt Th# Collaga Board, TabJa 1; and tha C©H#ga Entrance lamination Board, 

mmmfCoff&g® Bound Seniors, !$&$ INew York: Th« Coflaga Board, 1368). 
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(The extent to which u number of tost? conform to this pattern is 
explained in Appends B.) 

The widespread misconception that the achievement decline ended 
only within the past few years thus probably stems from the greater 
attention paid to tests administered to high*school juniors and seniors-in 
particular, the SAT, The tests that showed an end of the decline taking 
place a decade or more ago are those given to young children, and they have 
been the focus of considerably less attention, 



How Large Was the Decline? 

The severity of the decline in achievement can be illustrated in two ways: 
by examining the actual level of achievement shown by typical student® in 
each of two vmm (a criterion-based or absolute standard), or by comparing 
the achievement of a typical student in one year to the distribution of 
achievement in some other year (a normative standard), 12/ This section 
applies both standards, 

The Size of "the Decline Relative to a Normative Standard, The few test 
series reporting trend data in normative terms suggest that, at grades 6 and 
above, the decline averaged about OJ standard deviation over the entire 
period of the decline (see Table III -2), 13/ This average indicates that the 
median student at the end of the decline would have scored at about the 
38th percentile at the beginning of the decline. The severity of the decline 
varies m peatly, however, that a single average has little value. At one 
extreme, tha largest decline in the measures considered here was 0,55 
standard deviation, placing the median student at the end of the decline 
roughly at the 29th percentile before the decline began, (The two largest 
declines, however, were on college admissions tests- -the SAT and 
ACT**and were substantially exacerbated by changes in the composition of 



12. Most often, the "typical** score li the mean or median Is each year, Since the 
characteristics of the groups taking most tests change ever time, trends in these typical 
scores in part reflect changes in student characteristics, rather than only changes in 
the achievement of a student with any given characteristics. 

13, See Chapter 2 for an explanation of standard deviations. 

The numbers here do not adjust the SAT trends for "scale drifts a gradual drop In the 
level of the difficulty of the test that led to an understatement of the SAT decline until 
the early 1970s, That adjustment was not made because of a lack of information about 
the severity or direction of any changes in difficulty since that time* If the adjustment 
is made, however, the conclusions of this section are unaltered* the average decline 
remains about 0,3 standard deviations, and the maximum decline is in the range of 
0.57 to 0,60 standard deviations rather than 0.55, 
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TABLE Iil*2, SIZE OP THE ACHIEVEMENT DECLINE, INDICATED 
BY SELECTED TESTS AT GRADE 8 AND ABOVE 
(Including only tests spanning all or nearly 
all of the decline) 



Test 



Subject 



Total Decline 
(Standard 
Deviations) 



SAT a/ 



Specific Tests 



Largest 
Smallest 



Verbal 
Mathematics 



0.48 
0.28 



Iowa Grade 12 (ITED) 



Largest 
Smallest 



Reading b/ 
Mathematics 



0,40 
0.27 



Iowa Grade 10 (ITED) 



Largest 
Smallest 



Reading hf 
Natural Science 



0,3? 
0.P 



Iowa Grade 8 (ITBS) 



Largest 
Smallest 



Mathematics 
Vocabulary 



0,47 
0 S 26 



Iowa Grade 6 (ITBS) 



Largest 
Smallest 



Mathematics 
Vocabulary 



0.38 
0.10 



(Continued) 
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TABLE III - 2. (Continued) 



Total Decline 
(Standard 

Test Subject Deviations) 



ACT 



Largest Social Studies 0 , 65 

Smallest Science *0,06*£/ 



All Tests in Tabk 

Average 0,81 

Minimum -0,06 

Maximum 0,56 



SOURCES: CBO calculations bassd on Hunter M. Breland* The SAT Score Decline; College 
Board, National College-Bound Seniors, 1978 and 1985; Iowa Testing 
Programs! "Mean ITED Scores by Grade and Subtest for the State of Iowa: 
1962*Preseat ,p and h Iewa Basle Skills Testing Program, Achievement Trends 
in Iowa; 1955*1985" (unpublished and undated); Robert Forsyth, |>#rsonal 
communication, August, 1084; A* N. Hieronymus, I* F, Lindquist, and H, 
D. Hoover, Iowa Tests of Basic Skills: Manual for School Administrators 
(Chicago* Riverside, 1982); L, A, Munday, Declining Admissions Test Scores; 
and American College Testing Program, National Trend Data for Students 
Who Take the ACT Assessment 

NOTE* Alternate grades (7, 9, 1 1) omitted for clarity. 



a, SAT scores are not adjusted for scale drift. Research indicates that the first part of the 
decline is understated by perhaps 0,09 standard deviations because of *cale drift, The 
extent and direction of scale drift over the past decade is not yet known, however. 

b, This reflects the "Interpretation of Literary Materials" test, Reading skills also are 
tapped by the other tests in the ITED battery, 

c, Negative numbers represent an increase in average scores. 
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the test-taking group,) HI At the other extreme, the ACT natural 
science test actually showed a trivial increase during the years of fhe 
general decline. Thus, a different mix of tests-or a larger and more 
representative sample of tosU-might have yielded a very different average 
BiZQ of the decline* 

Some of the variability In the urn of the decline stems from known 
causes, such as the age of the students tested and changes in the 
composition of the groups of students taking the test, On the other hand, 
much of the variation appears to stem from unknown factors or from 
considerations that lie largely outside of the scope of this report, such as 
decisions about the specific skills and knowledge to be tested. 

The Si%e of the Decline Relative to an Absolute Standard , Although the 
apparent severity of the decline varies with the absolute achievement 
criterion chosen, the average decline was clearly large enough by many 
standards to be educationally significant 

The best criterion^based gauge of the achievement decline is probably 
the National Assessment of Educational Progress (NAEP), The NAEP 
reflects representative samples of the national population of students, tests 
students at the elementary, junior-high, and senior-high levels, and encom- 
passes a wide array of substantive areas and types of skills, Moreover, 
actual test items from all of the NAEP assessments have been published, 
along with the percentages of students of different ages answering each 
item correctly* This information provides an intuitively clear view of 
students* levels of achievement, 1§/ 

Even the NAEP, however, should be used to illustrate the types of 
skills that deteriorated rather than to indicate the total magnitude of the 
decline. Because of the timing of NAEP assessments, most of them 
understate the severity of the decline, in some instances probably by a very 
large margin* The NAEP began with a science assessment in 1969, with 
initial assessments in other subjects starting over the following several 



14, Advisory Panel on the Scholastic Aptitude Test Score Decline, On Further Examination 
(New York; The College Entrance Examination Board, 1977)- L, A, Munday, Declining 
Admissions Test Scorn. The impact of compositional changes is discussed in 
Congressional Budget Office, Educational Achievement; Explanations and Implication* 
of RecmtTrmds (forthcoming)* 

15, While annually equated tests provide much clearer information on trends, no such tests 
have been tabulated in a way that facilitates comparison with an absolute achievement 
criterion. 
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years. The most recent published Assessments wore largely earned out 
between 1976 and 198L Therefore, the NAEP trends exclude varying 
portions of the early part of the decline and probably often mask the later 
decline by mixing with it upturns in achievement that have occurred in 
recent years* l&/ 

Although the NAEP tests students at ages 9, 13, and 17, this section 
describes the results among 17*year*olds f because the trends among 17* 
year-olds are likely to include fewer, if any, years of the recent upturn in 
achievement 17/ (Comparable information on ages 9 and 18 appears in 
Table 111*3.) 

Mathematics, Between 1972 and 1977, the proportion of NAEP 
mathematics items answered correctly by 17-year-olds dropped from 64,0 
to 60.4 percent (see Table III-3), While this decline appears modest, it 
occurred over a time span that was probably less than half of the total 
period of decline and also masks more substantial deterioration of perform- 
ance on certain important types of items* 18/ In addition, the rate of 
success on certain types of items was remarkably poor in both years. One 
NAEP computation item* for example, asked: "Express 9/100 as a percent" 
The proportion of 17-year*olds answering this item correctly dropped eight 
percentage points over the five years, from 61 percent to 53 percent. 
Similar results were obtained by a problem that asked: "A hockey team won 
6 of the 20 games it played, What percent of the games did it win?** 
Another problem required students to use a simplified electrical bill to 
determine the cost per kilowatt if 606 kilowatts produced a bill of $9,09, 
The proportion of students succeeding on this item fell from 12 percent in 
1973 to 5 percent in 1978, 19/ 



16. Because NAEP assessments are carried out at interval! of four or five years, the ends 
of th* decline in each of them cannot be pinpointed, This precludes estimating any recent 
increase in each series and disentangling it from the estimates of the preceding 
downturn, 

1?, In interpreting the examples given below, it is important to bear in mind that only 17- 
year-olds still in school were tested in the National Assessment. As a result, the NAEP 
results are likely to overestimate-perhaps by a sizable margin«the average level of 
achievement attained by the entire cohort of 17-year-olds, 

18, The subsequent interval from 1977 to 1981 showed little change, but it probably brackets 
the end of the decline and therefore includes some of the subsequent upturn, 

19* These and the following mathematics examples are taken from National Assessment 
of Educational Progress, Changes in Mathematical Achievement, 1973-2978 (Denver: 
NAEP/ Education Commission of the States, 1979), 
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TABLE III-3. SUMMARY OF NATIONAL ASSESSMENT RESULTS 
IN THREE SUBJECTS, AGES 9, 18, AND 17 
(Average percent of items correctly answered) 



Subject 


Age 9 


Age 13 


Age 17 


Mathematics a/ 
1972 
1977 


56,7 
55. 4 ^ 


68.6 

66,64 
fin R ci 


64.0 

60.4^ 
oU * ^ 


Reading dl 
1970 
1974 
1979 


64.0 
65. 2^ 
67. 9 J/ 


60.0 
59.9 
60.8 


68.9 
69,0 
68.2 


Science 
1969 
1972e/ 
1972 f/ 
1976" 


61.0 
59. 8 c/ 
52.3 
52.2 


60,2 
58.5JE/ 
54.5 
53.8 


45.2 
42.5C/ 
43. 4 
46.5 J/ 



SOURCES: CBQ calculation based en National Assessment of Educational Progress, 
Three National Assessmenti of Reading (1981), Tables 2, 4, and 6. 
Mathematics: The Third National Mathematics Assessment; Remits, Trends, 
and Issues (1983), Tables 51 and 6.2, and Mathematical Technical Report: 
Summary Volume (1980), Tables 2 t 3» and 4; and Three National Assessments 
of Science (1978), Table A* 1 (Denver: NAEP/Education Commission of the 
States), 

a, 1977 and 1981 scores reflect all items used In those two assessments* 1972 scores are 
obtained by subtracting from 1977 scores the change between 1972 and 1977 on all items 
used in those two years, 

b* Change from preceding test marginally significant, p less than ,10, 

c, Change from preceding test statistically significant, p less than ,05, 

d. All scores reflect all items used in all three years, 
ft. Reflects only test items shared with 1969, 

f. Reflects only test items shared with 1976, 
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Student achievement also dropped on certain NAEP items that were 
less tied to concrete applications. JFor example, the proportion of 17-year- 
old students correctly finding a miss-iiig numerator in an equivalent fraction 
fell from 82 percent to 72 percent, The proportion who could solve for * 
and y in a system of linear equations dropped by a third, from IB 
percent to 12 percent* 

On the other hand, success nates on some items did not decline-an 
optimistic note that is tempered by &he fact that in many instances the rate 
was poor in both years, For example, in both 1973 and 1978, about 20 
percent of students successfully grapfaed the equation y » 2x + 1. About 15 
percent and 12 percent could identify the slope and intercept, respectively, 
of the equation 2y ^ 6x ■ 8. Five percent ascertained the equation of a line 
when both the x- andy-intercepti werfc given. 

Reading, In contrnst to mathnematics, the first three NAEP reading 
assessments showed no substantial & derail decline in the achievement of 17* 
year-olds (see Table Ifl-3). This pattern is inconsistent with a variety of 
other tests that showed substantial declines in reading and reading-related 
skills. The results of those other tests, however, have not been published in 
a form that permits comparison with a concrete achievement criterion* 

On the other hand, a decline was apparent in one of the specific 
reading skill areas tapped by the NABP*4nferentiaI comprehension (that is, 
comprehension that requires going beyond the information explicitly stated 
in the question), This discrepancy is discussed in a later section. 

Science. Over the seven-year span covered by the first three NAEP 
assessments of science* -1969, 1972^ and 1976-the average score of 17* 
year-olds dropped 4,6 percentage points, or about 10 percent (see 
Table 111*3). 

As in the case of the mathematics assessment, the low success rate on 
certain items is as striking as the decline. One NAEP item, for example, 
asked, "Which of the following happens when any combustion reaction takes 
place?*' The correct ehoicKhat heat is evolved«was selected by about 68 
percent of 17-year-olds in 1969 and Ifoy about 54 percent in 1977* Another 
item asked for explanation of the statement that the relative humidity is 50 
percent. About 47 percent of students in 1969 and 42 percent in 1977 
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selected the correct answer-"The atmosphere contains half ao much 
water as it could at Its present temperature," 20/ 

Social Studies, The NAEP citizenship and social studies assessments in 
1968, 1971, and 1975 showed sizable declines in the proportion of 17-year- 
olds correctly answering some items assessing knowledge of the Constitu- 
tion, the structure and function of government, the political process f and 
international affairs. A smaller number of items, however showed 
increases. 

In one example, the proportion of students answering that a statement 
of civil rights can be found in the Constitution dropped from 86 percent to 
81 percent between 1971 and 1976,21/ The proportion correctly answering 
the question "The Congress of the United States is made up of two parts. 
One part is the House of Representatives, What is the other pprt?" fell 
from 94 percent to 88 percent from 1988 to 1976, (The proportion choosing 
the most popular incorrect answer-the Supreme Court-doubled to 8 
percent during that period,) The proportion recognizing that the Congress 
was part of the legislative branch of government dropped during the same 
time, from 84 percent to 74 percent* Fifty-four percent of 17-year-olds in 
1988, but only 85 percent in 1975, recognized that the circumstance of a 
state having more Senators than Representatives occurs as a result of low 
population. The proportion able to define "democracy" declined from 86 
percent to 74 percent between 1988 and 1975, 



THE RECENT UPTURN IN ACHIEVEMENT 



Since the end of the achievement decline, the general trend has been a 
marked upturn in average achievement. In some instances, the rate of 
increase has been comparable to or even greater than the rate of decrease 
during the later years of the decline, and average scores on some tests have 
approached or exceeded their predecline high points. Moreover, the pattern 



20. National Assessme nt of Educational Progress. Three National Assessments of Science; 
Changes in Achievement, 1969*77 (Denver; NAEEldueatioa Commission of the States, 
1978), 

21. This and the following examples are taken from National Assessment of Educational 
Progress* Changes in Political Knowledge and Attitudes, 1969*76 (Denver: 
NABP/Edueatieii Commission of the StateSp 1978), 
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of the trends among tests administered at different wgas suggests that 
some of the test batteries that have seen only a modest upturn to sriate- 
most notably > the SAT-might show marked increases 1 xx the next several 
years. 

In contrast, there are a number of important exceptions ta this 
optimistic picture, Scores on the American Coliego Testing Program 
college admissions tests have yet to turn up substantt ally* A statewide 
assessment program in Pennsylvania has shown stable scores In the ~iower 
grades and slight deterioration at th' secondary leVeJ in Accent 
The California statewide assessment also has shown no upturn a:aaong 
seniors, though It has shown increases in the lower graded 2g3/ 

Much of tho variation in recent trends appears linked to tho age of the 
students: tests given to older students have generally itit^reascd loss in total 
than have those administered to younger children, At one extreme, some 
tests administered in the elementary grades have ri0e: ii to their hi jghest 
levels on record- -a span of as much as three decades. At the other pole, 
the generally better known tests adminigtored in the 3iiglx school grades 
(such as the SAT) have generally shown more modest gain#. 

The smaller total upturn to date in the higher graadei appeals to 
reflect the shorter time since the upturn began in those grades, rather than 
a lesser rate of improvement. The upturn, like the end off th& decline, sshows 
a cohort pattern, and fewer of the cohorts producing HsUng scores hav^ yet 
reached the higher grades, (The relationships betw^e^r* age andE the 
subsequent upturn are discussed further in Chapter IV.) 

This pattern suggests that scores on tests admiiilsteered In the h^Jgher 
grades might rise further in the doming years, Th#^t U, the cohorts 
responsible for the most recent rise in scores in the feitf&f g£tk% migifct be 
expected to produce similar gains as they move through the higher grades, 
The cohort pattern notwithstanding, howevir, any mtfi^e^ of factors -<could 
cause future trends in the higher grades to diverge frora the recent tarends 
produced by those same cohorts in the earlier grades. 



22, Robert Coldiroa, P#nn§ylvaftla State Pipartmsat of ldueati<J& # ptfermoAdcsmmunleation, 
January 1985, 

23, California State Assessment Program, unpublished tabulations. 
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Kisi Upturn Banded? 

thm most recent- Notional Assessment of reading found that the average 
tending proficien cy of nine-year-olds was largely unchanged betweon 1979 
m<l 1983* while mhs achievement of older students continued to risoM^ That 
is, tlieMrth eohgrm of 1974 showed no gain over the birth cohort of 1970. 

Given the cohort pattern evident in the achievement upturn, this 
ppttcm-if it app»ears on other tests and is maintained-suggesti that the 
upturn Is, for thess moment, over in the youngest age groups and that it will 
end fairly soon i n the higher grades (as the birth cohorts that wore nine 
years old botweew 1979 and 1983 pass through the grades). Tests admin- 
isterei to eighth graders would be expected to level off in the 1983 to 1987 
Pef WiWhile scoreHBs of seniors would level off between 19S7 and 1991. 

Whether thais leveling off is a general phenomenon, however, is 
unclear, No otheizr national data are available to test it, and state-level data 
are inconsistent. The proportion of New York third-grade students passing 
the state referen«ce points in mathematics and reading, for example, has 
been stable since the 1970 and 1971 birth cohorts (see Figure B-5 in 
Appendix B). On the other hand, average scoies in the elementary grades in 
lovva have eontiiiT-ued to rise, even in the most recent (1984.1985) year of 
data (see Figure IMI-2). In the next several years. National Assessments will 
tnK% place in other subject areas, which will provide nationally 
representative datita indicating whether this leveling off Is a general 
occurrence. 



DIFFERENCES IN-? TRENDS AMONG TESTS 

BfeCfcut achievetoe^it trends have varied greatly from one test to another. 
For ample, conciparisons of recent trends on the SAT, the ACT, and 
gtandated tests given to high school juniors and seniors as a whole show 
jflany discrepancies from one test to another (see Table 111*4), This 
vari^tta indicates that no single test, taken alone, is an adequate indicator 
of overall achievesment trends. Indeed, in the absence of a clear under* 
striding of the variations in the trends from one test to another, even a few 
taken together cannot always be assumed to be a sufficient indicator, 



24* Nitbnal Assessment of Educational Progress, The Rmding Report Cardi Progrm 
Tmrd Excellence in Our Schools (Princeton: NAEP/Bdueational Testing Service, 1985), 
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This variation ^p^Witly wElects differences both among the tests 
themselves and amoftg tjhHtiidentfl taking them, The precise role of each is 
unclear* however, aftd eOBO of th^ specific differences between tests are 
hard to explain. fof fluwple, although the ACT sample is in some 
important respects Qd^pariMe to ihmm SAT group and underwent some of the 
same compositional ob^np 0§ a$«cted the 8AT\ the trends on the two 
test! are markedly diflfertnt.26/ Conversely, although the Iowa 1TED is 
substantively similar tho ACT nwnd was presumably free of many of the 
compositional change^ tbflt biased th-ie SAT and ACT trends, it showed total 
declines roughly as laf§0 ailbse shovspvn by the SAT (see Table III-2), 



Subject Areas 

The relative severity #f ttetfecHn^ in different subject areas has been the 
focus of considerable di s c«lofl| in therms of both explanations of the trends 
and debates about appelate rfi&pc^Dnscs. Debate has focused not only on 
specific subject areM* but also on ttwo broad categories of subjects: those 
primarily taught if dir*eCtlf in school, and those that are to a substantial 
degrae taught "m&imcQ^ both in school and elsewhere,^ Some people 
would argue, for &scflbi|fe, that certain mathematical skills**such as 
converting fractions to stasis or solving algebraic equations**nre taught 
primarily in school thwUjli formal irastruction and drill In contrast a larger 
proportion of vocabulary taowltdg# is presumably learned as an incidental 
result of daily experi^r^e A home a^nd elsewhere* For this reason, a larger 
decline in the "indirestV 1 taught gumbjects might imply that the decline was 
attributable more to chtfVtt ift gNmdent characteristics or to broad social 
changes than to ch^0%|lft Schod : ing, while larger declines in "directly" 
taught subjects would imitate schooling, 27/ 



25, Compositional ehrtg^ fctieg means are discussed in Munday, Declining 
Admissions Te$t SeOf#a v 

26, Donald Rock mi othef*, fahf* Anmwmated with Decline of Test Scores, p* 6, 

27* While few people *vt?uyirfu£ wdtH ^^the idea that students learn a larger proportion 
of their vocabulary thafiof their r^a^thematUal skills outside of school, the observed 
relationships bcWOc** gikliveiaent Ba different subject areas and home and school 
characteristics ^liir-eut, Fgf example, an analysis of the relative size of horn© 

and school effects ot± ^Jtomeftt i» several countries found that schooling effects were 
indeed larger in self f^i linn in refiling among 10-year-oldi but not among 14-ye^r- 
olds (James CQia^at^Msthodi #n«d Results in the IEA Studies of Effects of School 
on Learning," Retzieky Of National K^e$earch t vol. 45, Summer 1975, pp, 355-886, Tables 
2 and 3 j 
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TABLE III -4. RECENT TRENDS ON STANDARDIZED 
TESTS AMONG HIGH SCHOOL SENIORS 
AND JUNIORS, WITH TRENDS OVER 
THE SAME PERIODS ON THE SAT a/ 



Change 
(Standard 

T est Subject Deviations) 



1970 to 1983 



National Assessment Reading 10 

SAT Verbal -.26 

ACT English , 02 

ITED-Iowa Grade 12 Vocabulary hi - , 08 

Reading e/ - ,12 

SAT Mathematics - , 14 

ACT Mathematics . . 23 

ITED-Iowa Grade 12 Mathematics hi . , 03 



1971 to 1979 

NLStoHSB Vocabulary -.22 

Reading ..21 

SAT Verbal .,26 

NLS to HSB Mathematics - , 14 

SAT Mathematics . , is 



1970 to 1981 

Illinois Decade Study d/ English 1 - . 38 

English 2 .,49 

SAT Verbal ..28 



(Continued} 
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TABLE 111*4* (Continued) 



Test 



Subject 



Change 
(Standard 
Deviations) 



1970 to 1981 (cont'd.) 



SAT 



Illinois Decode Study 



Mathematics 1 
Mathematics 2 
Mathematics 




SOURCES: CBO calculations bused on National Assessment of Educational Progress, The 
Reading Report Card; Albert Baaton, NAEP/Bdueational Testing Service, 
personal comffiuniefltioa, December 1985; Hunter M. Breland, Th% SAT Scare 
Decline; Table l v College Board, National College-Bound Seniors, 1978 and 
1985; L, A, Munday, Declining Admissions Test Scorn* and American College 
Testing Progrnm, National Trend Data for Students Who Take the ACT 
Assessment; Iowa Testing Programs, "Mean ITED Scores by Grade and Subtest 
for the State oflowa: 1962* Present;** Robart Forsyth, Iowa Testing Programs, 
personal communication, August, 1984; Donald A* Rock, Ruth B, Ekstrom, 
Margaret E. GoerU, Thomas L* Hilton, and Judith Pollack, Factors Associated 
with Decline of Test Scorn of High School Seni§r$ t 1912 to 1980 (Washington, 
D.C.: Center for Statistics, U.S. Department of Education, 1885); Student 
Achievement in Illinois, 1870 and 1981 (Springfield: Illinois State Beard of 
Education, 1083), 

a. The dates used in each set reflect the longest portion of the 19704983 period for uhich 
data art available The NLS/HSB and Illinois Decade data are available only tut the 
ptrfods Indicated. Companions extending past 1979 generally include a period of 
increasing scores. 

b, These small changes in the ITED rtflict substantial declines that ware nearly offset 

bj gains since 1978 and 1979* 

c, This reflects the "Interpretation of Literary Materials'* test Reading is also tested on 
other tests in tha ITED battery, 

d. High school juniors only, SAT comparisons are therefore one year later. 
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Among the tests assessed here, no single? subject area consistently 
showed the largest drop, and the decline was not consistently larger among 
either directly or Indirectly taught subjects. In a majority of the tests, the 
drop was largest on language-related tests such as verbal reasoning, 
language usage, vocabulary, and reading, Tho exceptions were frequent 
enough, however, to suggest that this pattern is more reflection of the 
particular tests than an underlying characteristic erf .he achievement 
decline^M' Indeed, a difleront assortment of tests-if more were avail- 
able—might show a very different aggregate ranking of the decline in 
different subject a^eas. 

Thus, for example, language'related tes**s showed the largest drops on 
the SAT, a nationally representative comparison of high ichool seniors in 
1971 and 1979 (tho NLS and HSB comparison s and In some of the Iowa data, 
Conversely, mathematics showed the steepest decline in other Iowa data nnd 
in the notional normings of the California Achievement Test, Moreover, 
some of the language-related tests that showed particularly large declines 
(such as the vocabulary test in the NLS and HSB comparison) tap indirectly 
taught subjects, while others (such as the language test in the ITBS data and 
the ITED expression test) are clearly much more reliant on formal instruc- 
tion, (For more detail on the relative ste of the decline in different subject 
areas, see Appendix C.) 

Underlying this seeming lack of consistency is the fact that achieve* 
ment in any one subject can be defined-and measured--in many different 
ways, and the variations in measurement can be large enough to create very 
different trends, Thus, to speak of ,f the decline in mathematics achieve* 
ment" Is misleading. It is more accurate to speak of the decline in the 
mathematics skills measured by a specific test, and one should bear in mind 
that other tests might yield very different trends, 

Trends in average mathematics achievement of Iowa students clearly 
illustrate the effect of test differences on the severity of the decline.^' 



28. This discussion reflects only tists for which standard deviations are available, since 
the trends in different subject areas are made comparable by expressing them as fractions 
of a standard deviation. The National Assessment is therefore excluded, since standard 
deviations from previous assessments were not all retained by tha NAEP staff, 
(Lawrence Rudner, Office of Educational Research and Improvement, U.S. Department 
of Education, personal communication, January 1985), 

29 f Since most students in Iowa art tested with the ITBS through grade 8 and with the ITED 
in grades 8 through 12, differences between trends in lows on the grade 8 ITBS and 
the grade 9 ITED reflect little other than the differences in the tests themselves, The 
scores are bated on almost the same group of students at nearly the same point in their 
school careers* 
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Over the entire period of the decline* the eighth*grado Iowa ITBS dropped 
substantially more in mathematics than in other subjects, In contrast, the 
ninth*grade Iowa ITED showed somewhat less decline in mathematics than 
in social studies or reading (the interpretation of literary mato^ials ,, )* Over 
tho whole period, the mathematics decline on the grade-eight ITBS was 
nearly half a standard deviation, or about ,036 standard deviations per year, 
On the 1TEB, the total mathematics decline and the annual rate of the 
decline were both roughly half as large (see Figure 111*5), The explanation of 
this difference might lie in the construction of the tests; the ITBS is roughly 
split between concept items (which are highly curriculum bound) and 
applications items, while the ITED places much greater emphasis on the 
latter, 30/ 



Level and Type of Skill 

Evidence of the trends in different types and levels of skills is of two types; 
direct comparisons of different items within individual tests, and indirect 
inferences from comparisons of different testi**iuch as those in different 
subject areas or given at different grade levels* Direct comparisons of 
items can be carried out on any imi f but little such analysis is currently 
available. Indirect inferences are therefore also noted in this section. 

The Decline , That the overall drop in achievement entailed sizable declines 
in higher-level skills, such as inference and problem*soIving, is beyond 
question, 31/ The extent to which declines occurred in more basic skills, 
such as simple arithmetic computation, is less clear. While some tests 
showed substantial declines in basic skills, other indices of basic skills 
showed little or no drop, In the aggregate, the evidence suggests that 
declines in the more basic skills might have been generally less severe than 
in higher-order skills, but not without exception. 



30, Robert Forsyth, Iowa Testing Programs, personal communication, February 1985* 

3L While the evidence leaves no doubt that substantial declines occurred is some higher* 
level skills, not all higher4evel tests showed declines. The most notable exception is 
the Project TALENT 15-year retests. which showed increases in abstract reasoning 
and creativity in grades 9 through 11 between I960 and 1976 (Table C*l in Appendix 
C). This exception, however, might be artifactual* The starting point of the comparison 
- * l9S0-antedated the predecline peak in achievem? at, thus confounding earlier growth 
in achievement with the decline (Cleary and McCandless, Summary of Score Changes 
(in Other Tests)). la addition, the 18*year retest suffers from two serious threats to 
validity and representativeness: a very small sample (only 17 schools), and meager 
assessment of changes in school characteristics that might bias the results. 
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figure lil-5, 

Iowa Mathematics 
Achievement, 
Differences from 
Lowest Year 



,3 




1971 1974 
Tts! Vsir 

SOURCES: CBO calculations ba&Kf on -Mean ITID Test Scores by Greda end Subtest for tho Slat© of Iowa" and 
"Iowa Bmte Skills Testing Progrvm, Achievement Tmndi in lows: IBB I^" {{own Tm'mq Programs 
undated and unpublished tabulations!; Robert Forsyth, Iowa Testing Programs parsons! 
communtovtloft, April 1984; and A, N. Hferenymus, f . F, Lindquta, and H. 0, Hoover, fesff 
0*«/c SAiffil; Manu&t For Sehool Administrators (Chicago: Riverside, 



1983 



A greater decline in higher^ordir skills is apparent in the performance 
of 17*year-_olds on the first two NAEP mathematics assessments (1972-3 and 
1977-8), which span the last years of the decline. Performance on these 
tests was tabulated separately for four types of skills* 

o Knowledge: "recall of facts and definitions," including facts of 
the four basic arithmetic operations and measurement. 

o Skills: "the ability to use specific algorithms and manipulate 
mathematical symbols." This domain includes "computation with 
whole numbers, fractions, decimals, (and) percents...; taking 
measurements; converting measurement units; reading graphs and 
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tables; and manipulating geometric figures and algebraic 
expressions," 32/ 

o Understanding: items "implying a higher level of cognitive 
process than simply recalling facts or using algorithms. Items m 
this domain required explanation or illustration of various skills 
and "transformation" of mathematical knowledge. 

o Applications; items requiring the use of the preceding three types 
of skills, usually in problem-solving. 33/ 

Average performance in the simplest domain-mathematical know, 
ledge-did not change at all during the five-year interval (see Table III -5). 
(An increase in performance on items involving metric measures offset a 
relatively "mall decline in the rest of this domain.) Both of the two highest 
levels-understanding and applications-showed declines. Moreover, the 
average performance in the applications domain was very low in both 
years 34/ The "skills" area showed a comparably large decline, but withm 
that aril, the drop tended to be largest on the more complex items. 35/ 

The second international mathematics assessment by the International 
Association for the Evaluation of Educational Achievement (IEA) yielded 
S grade eight that are comparable to the NAEP in this respect 
Average achievement in grade eight fell over the 18 yews between the first 
and second assessments, but the declines were greater 'for more demanding 
comprehension and application items than they were for imputation 
items." 36/ On the other hand, the same assessment found precisely the 



32 At the simple pole, the "skills" domain Incorporates items that would bo considered 
32 ' "balic skill?' by' .11 observers-far example, simple arithmetic operations At Ui. oth. 

polo, it subsumes some fairly complex operations, such as solving a system of linear 

equations for x and y and solving quadratic equations. 

33. National Assessment of Educational Progress, Changes in Mathematical Achievement, 
1973 -78, p.xi. 

34. Ibid., pp. 12-15. 

35. /&id.,pp.4-9, 

36. F. Joe Crosswhite, John A, Dossey , Jane 0. Swaflbrd CurUs C. McKnight Thomas J. 
Coonav and Kennth J- Travers, Second International Mathematics Study: Summary 
^fc7thl%ited States (Champaign: Stipes Publishing Co., 1985). p.* Gtjjj 
tKumtog of the two assessments and the age of the students, he eighth-grade trends 
£ the international assessment probably combine several years of Increasing 
achievement with a longer, previous period of decline. 
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TABLE HI -6. 



NAEP MATHEMATICS CHANCES 1972- 1 977 
AGE 37, BY AREA (Average percent 
of items correctly answered) 



Area 



Total 

Knowledge hi 
Knowledge tl 
Skills 

Understandinf 
Applications 



1972 



62 
63 
63 
55 
62 
?3 



SOURCE: 



1977 



48 

63 
62 
50 
58 
29 



Change 



-4J/ 
0 

if 

•4j/ 

-41/ 



NAEP, Change in Mathematical Achievement, 1973 ■ 78. Tables 1.4,6, and 7, 

a, Statistically significant, p less than ,05, 

b. Including metric measures. 
C Excluding metric measures. 

d. Components do not yield stated change because of rounding. 



ZSTnf P l"l r " am ° ng 12th ^ ad « students: an increase in achievement 

and for th T Seen j n the more demanding comprehension quesTns 

levll - ot/ kl ^ B 7 dentS , at the 6Ven fflore demanding 
level. 27/ The I2th-grade results, however, were in lame Dart a 

of the performance of calculus students, who constitute ? small an "select 

T??L f nlW daSS and whose Performance may therefore say H tie 

about that of high school students in general. wereiore say little 

the NAEp e ^-° f 8 greatW - deCline in hi « h «-order skills also appeared in 
the NAEP reading assessments, As noted earlier, 17-year-oids showed little 
tota change in reading between the 1970-71 and 1979-80 assessments fhl 
small (and statistically Insignificant) change In total r^ad~rl man« 



37. 



Boca UK tht H».1 lost ... rtmtoUt.rd.nK . r«,Mr« "X ° 'Z °' h '] m r T 
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however, masks a somewhat larger (and statistically significant) decl ne m 
inferential comprehension (see Table 111-6). As defined in the NAEP 
inferential comprehension can be considered the highestdevel skill tapped 
by the test, It entails comprehending ideas that are not explicitly stated by 
drawing inferences from material that is explicit 38/ In contrast, literal- 
comprehension scores changed by only a trivial amount, and reference 
skills.-also a more basic area-actually improved, albeit by a very small and 
statistically insignflcant amount. 

Deterioration of higherdevel skills is also apparent from declines on 
tests that are designed specifically to tap them. 39/ The SAT is the most 
salient example. As noted earlier, it is designed (and is generally con- 
sidered) to be a test that relies heavily on skills such as reasoning, problem- 
solving abilities, and verbal relationships (such as are assessed by analogies). 
The Illinois Decade Study (which used a test that was also developed by the 
Educational Testing Sendee) provides another example While the Decade 
Study included many items that required that students know specific pieces 
of information (such as rules of English usage, social-studies facts, and 
mathematical terminology), it also relied heavily on inference. 4£/ The 
declines on the test were relatively large (see Table III -4 and Appendix C). 

The relationship between age and the size of the decline- -discussed in 
Chapter IV- -might also be indirect evidence of a lesser deterioration of 
more basic skills: As noted earlier, declines in the first three grades tended 
to be slight and short-lived and might best be seen as brief interruptions of 
an otherwise steady upward trend in those grades. Since the curriculum m 



38. $A,E?,Three National Assessments of Reading,??. 4,25. 

39 The tests noted here are all multiple-choice format. As noted in Chapter 2, some people 
have afgued that muitiple-eheice tests are demonstrably Urn itedi^ their ab^hty to Up 
many higher-order skills (for example, Norman ^. ^.M ^ 
Knees of Testing on Teaming and Learning," American PlfchologM, vol. 39 (March 
1984) pp 193-202) Even if the tests noted here leave many relevant higher-order skills 
uTassessed or Inadequately measured, however, few ^.^."^^^ 
that they do rely iubstantialiy on some higher-order abilitiee end that those abilities 
may a greater role in determining scores in these tests than in some others (such as 
the NAEP literal comprehension reading subtest or the NAEP mathematics test as 
a whole), 

40, Illinois State Board of Education, Student Achiavameni in Illinois, 1970 and 1981, 
Appendix A, 
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TABLE III-6. NAEP READING CHANGES 1070- 1979 
AGE 17, BY AREA (Average percent ' 
of items correctly answered) 



1970 1970 



SOURCE: 



Change 



Total 


88,9 


68,2 


•0.7 


Literal Comprehension 


72.2 


72,0 


-0.2 


Inferential Comprehension 


64,2 


62.1 


-2,1 a/ 


Reference Skills 


69,4 


70.2 


0.8 



NAEP t Three National Assessments of ft*ntlinn> nu • r, * 

1970*80, Table 6, 0/ Hmdin ^ Changes in Performance* 

ft. Statistically significant, plesithan ,05, 



Iton? « ^ • nel " deS 8 krge amount of basic skills-decoding and 
" on f^ henwon m reading, memorisation of basic ariCSc fa£ 

fcvorabletrendsin^^ lively 

feJfeaap^^ The characteristic, of the subsequent upturn are as 
W»n L • P T eauM score8 on tests administered in high schools 
a„f«/ if ° Vm / ? j5r 1 nce ^' That the U P^™ I* occurring iniostte£ 
and at all grade levels-including the SAT in the last few yews Ju JLta 
that improvements are probably occurring at manv skill wf. % * 

feTeTst am«L U d?ff rn ^1 to diffS g^ade 

levels or among different groups of students. 

in hi^l^r 6t K b if inco ?P let * suggestions of relatively smaller increases 
in higher-level skills are found in the most recant (1981 821 " maVp 2 2 
emetics assessment. Because the NAEP tJTJSSff re^LS 
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sample of students and because it permits comparisons of changes In 
sampji. m ov nnrHrularlv important indicator of thu mix 01 

various skill areas, it is a P^icularly »*P™ b] increQge ln the 

skills comprised by recent trends. The NAM found is iza 

statistically insignificant gums among 9-year i £ 
improvement among 13-year-olds, however, was disturbing. 

They improved most on the knowledge, skills, and understand- 
ing exe ^s, and least on the applications exercises, Further 
?udy shows that their improvements in underst anding came on 
Scps judged relatively easy by a panel of niathemataes 
rat performance levels on exercises calling for 
r understanding showed little or no improvement. 4,2/ 

0„ the other hand, recent gains among the highest- achieving students 
On the otner nana, itu » M w1tip chanter-suggest improvement 
on difficult tests- -discussed in the following chapter su K b« f 
on oiutt. » possible that some groups of the highest- 

data remain too limited to answer this question. 



42, HAEPiThird National Mathematics Assessment^ xv, 
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CHAPTER IV 



GROUP DIFFERENCES IN 
ACHIEVEMENT TRENDS 



While the achievement decline and the subsequent upturn occurred Among 
most groups of students identifiable in the existing data^ both trends varied 
among different groups, Similarly, achievement trends have varied among 
different types of communities and schools. 

The most important differences in trends are; 

o Greater declines on tests administered to older students; 

o Relative gains by black and Hispanic students, compared with 
nonminority students; and 

o Relative gains in high-minority schools and schools in disadvan- 
taged urban communities compared with the nation as a whole* 

In addition, there is some indication that students in the bottom fourth of 
the achievement distribution gained ground relative to those in the top 
fourth during part of the 1970s* The evidence on this point is inconsistent, 
however, and it is not clear that this narrowing of the gap occurred on a 
variety of tests or spanned more than a short period of time* Female 
students also showed slightly sharper declines on language*related tests 
(such as reading and vocabulary), but not on tests in other subject areas* 
Private school students showed declines comparable to those among public 
school students in reading and vocabulary, although evidence from a single 
test suggests that the decline in mathematics achievement was considerably 
smaller among private-school students, 

DIFFERENCES IN TRENDS AMONG TYPES OF STUDENTS 



Variation in achievement trends were associated with age, sex, achievement 
subgroup (that is, low versus high achievers)* and race and ethnicity* 
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Both the declino in achievement and the subsequent upturn varied markedly 
with the age of the students tested, but the effects of age appear to have 
been different during the two periods, 

The Decline, The total sire of the decline was strongly related to age. In 
general, tests administered to older students showed markedly larger total 
declines than did tests administered in the early grades, 1/ 

The Iowa state data provide the best assessment of this question and 
show a striking link between age and the size of the achievement decline 
(see Figures IV*i and IV-2). 2/ At one extreme, the decline in third-grade 
scores was small and short-lived; it can be characterized as a slight dip 
accompanying an eight-year hiatus in an otherwise unbroken, 3Q*year 
increase in achievement. In standardized form, the total decline was only 
about 0,07 standard deviation (depending on subject), and average scores are 
now over a third of a standard deviation above the low point of the deeline- 
and more than three^fourths of a standard deviation above their level of 



1. Although this conclusion is widely accepted, It Is important to note that it Is actually 
based on fairly limited data, To offer a good test of the relationship between age and 
the sue of the decline, a data series should meet a set of criteria that few do, The data 
series should include comparable tests administered to a range of ages, since a 
comparison of different tests can confound differences between the tests themselves 
with the effects of age, Scores should be presented in some form -such as standard 
deviations or pereentiles^that permits comparisons among grades, The data should 
also extend back to the onset of the decline, Data that extend over a relatively short 
period of time might tap a relatively steep portion of the decline in one grade and a 
relatively gradual portion in another, thus biasing the comparison among age groups 
In addition, random year-to*ytar fluctuations In scores-reflecting either sampling 
fluctuations or uncontrolled differences in tests-are more likely to bias conclusions 
based on a relatively few years. Finally, the data should be annual, to confirm that 
they subsume the entire decline and none of the upturn, Data that are collected 
intermittently-such as the NAEP and norming data from commercial elementary and 
secondary tests-can mix in varying periods of Increasing scores for different age groups 
Intermittent data also might capture a relatively steep portion of the decline in one 
grade but a comparatively gradual portion in another, 

2, The best assessment of the effect of age is obtained within each test series-that is 
comparing ITBS scores in grades 3 through 8 with each other, and similarly comparing 
ITED scores in grades 9 through 12. Even In this case, comparisons across the two tests 
--for example, comparing gradtrS ITBS scores with the grade 9 ITED-confounds 
differences between the two tests with the effects of age. (See the discussion in 
Chapter III of differences in trends among subject areas for a concrete example of 
differences of this sort between the ITBS and ITED,) 

7d 



Chapter IV 



Figure IV-1. 

Iowa Composite, 
ITBS, Grades 3-8, 
Differences from 
Post-1964 Low Point 
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SOURCiS: CBQ calculations based ou Mows Basic Skills Testing Program, ^"P^" 1 P.'^,. e e 

195B-19tt" (Iowa Testing Programs, unpublished and undated material); and A, N, Hieronymgs, f , r% 
Undqultf. and H. D, Hoover: Iowa Tests of BsbIc Skills: Manual For School Administrators (Chicago: 
Riverside! 
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Figure IV-2, 

Iowa Composite, 
ITED, Grades 9-1 2, 
Differences from 
Lowest Year 




10B8 



1970 19/5 
Tilt ¥§«r 



SOURCiS: ^^!f^^on_^n ITED Tost Scores by Grade and SubtM for the Em* of 
Tm m P J 0 Ptowmn, mpMmhm and undated tabulation^}- Robort Forsy?h 

resting Programs, personal communication, August 1984, ^uumi rgrsy.h, 



1980 



fnwii: 1062 
(owa 



wLr J hU8 ' the median thi ^frad e r in Iowa today scores 

£2" ? i 8n . 1 r0 f ghly 68 Pff cent o f Ws or her counterparts of three de™. 
past. Similarly, no sizable decline occurred in grade three it mSJI 
assessments in New York and California, 3/ statewide 

The decline in eighth-grade Iowa scores, in contrast was laree enough 
to depress composite achievement scores to their level ff three SadTs alo 
and long enough that recovery has as yet been incomplete. Wh"n put fn 
standard form, these differences appear even more striking Eh ctUo 
Iowa scores declined about a mrd ofust^^^^J 1 ^^ 



(Croton-on-HudsoB N«w v^t %u u J *' Y™?' y onrf Se «"*«&>-y Educational Process 
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recovered only -about two-thirds of what thoy lost, (Nonetheless, eighth- 
grade scores ore still about 0,2 standard deviation higher than they were 30 
years ago, placing the median student this year at the 68th percentile 
"lative to achievement levels in 1954.) 

The National Assessment of Educational Progress (NAEP) also shows 
only relatively few and small declines among nine-year-olds, relative to the 
declines in the older groups. This pattern might in part reflect the timing of 
the NAEP assessments, however, rather than-or in addition to«truly lesser 
declines in the youngest age group. 4/ 

Periodic national norming data from commercial standardized elemen- 
tary and secondary tests also suggest both a lack of decline in the youngest 
age groups and progressively larger declines in the remainder of the aehool- 
age population. For example, the national ITBS norming data indicate that 
in reading, the median third-grader's level of achievement increased by 4.3 
months from 1965 and 1983, only 0.5 months from 1983 to 1970, and 3.7 
months from 1970 to 1977. This change is consistent with the pattern In the 
annual Iowa data-that is, a pause in achievement growth in the late 1960s 
and early 1970s. In contrast, among sixth graders, a 2.2.month gain from 
1955 to 1963 was followed by declines of 2,8 and 3.0 months in the following 
seven-year periods. Among eighth graders, the drop was even more 
substantial after 1970.6/ The SRA achievement series showed composite 
gains in all but one grade between 1962 and 1971. Between 1970 and 1977, 
however, the trends varied greatly with grade level, In reading, for 
example,' the latter period included large gains (two«thirds of a standard 
deviation or more) in grades one and two; more moderate gains in grades 
three and four- small declines in grades five through eight; and larger drops 
in the higher grades, 6/ 



Given the cohort pattern tha%a by the end of the decline, the various NAEP assessment 
cycles probably began near or even at the tad of the decline among nine-year-olds, and 
thus the data most likely combine ft few years of the decline with a longer period of the 
subsequent increase. Since the NAEP assessment* are conducted only at intervals of 
four or five years, however, the precise end of the decline in that test series cannot be 
firmly established, and the extent of this confounding therefore cannot be determined, 

A. N. Hieronymus, E, F. Lfnquist, and H, D. Hoover, hum Test of Basic Skillt: Manual 
for School Administrators (Chicago: Riverside Publishing Company, 1982}, 

Science Research Associates, SRA Achievement Series, Technical Report #3 (Chicago: 
SRA 1981), Table 2; and Science Research Associates, unpublished tabulations. The 
trends between the 1970 and 1977 school years reported here reflect normings conducted 
in the springs of 1971 and 1978 and are labeled in terms of those calendar years in the 
published data. 
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Although the achievement decline persisted longer in the hieher 
grades the larger total drop in scores In those grades reflects more than the 
longer duration In addition, the decline appears to have been steeper. -that 
s, more rapid-. in the higher grades. This rapidity is shown most clearly by 
the lown data (both the ITBS and the ITED; gee Figures IV- i and 1V-2) In 

tit*. tW vJ M fU ihe dmhM ' m anV mAa Wm 8tee P er thm thnt in o» lower 
grades. This difference in the rapidity of the decline, however, appears to 
have been confined primarily to the earlier years of the decline. 

l^Muj£. As noted in Chapter HI, scores on tests administered to 
younger children have risen nubstontially more in recent years, compared 
with the decline m those grades, than have scores on tests administered in 
the higher grades. This pattern can bo seen clearly in the Iowa state trend 
data (both the ITBS and ITED; see Figures III-2 and 111-3); 

Grades 3, 4, 6, and 8 are now at their highest point in the three 

decades af aval lahle data, 

Achievement in grades 7, 8, 9, and 10 has rebounded strongly but 
is not yet at its earlier high (although grade 9 is nearly at that 
level), 

o Grade 12 achievement has begun riling but remains near its low 
pomt. 

The well-known SAT rend parallels the twelfth grade Iowa trend in this 
regard: achievement has been climbing for several years but remains only 
modestly above its low point (see Figure III- 4). Similar patterns-although 
often less clear-cut-. appear in a number of otner data bases as well, such as 
the Virginia State assessment data and the NAEP reading assessment. 
(Some achievement test series, however, are inconsistent with this pattern. 
For example, in the NAEP mathematics assessment, the recent increase in 
tite)7l mm ™ mafked5y grMter amon « JS-year-olds than among 9-year. 

The greater total rise in scores to date in the younger grades appears 
largely to reflect a longer period of rising scores in those grades rather than 
a greater rate of improvement than in the higher grades. The upturn in 
scores followed quickly after the end of the decline and shows the same 



7 ' ZSil^SSf T E j UMtl 2" ! P "^«». **** Third National Mathtmatm 



citour nmr.twMW in thknim at 



Figure IV 3. 

ITBS Composite, by Birth Year 




fifth ¥##r 



5QUHCI5; COO calculations bated on "Iowa Qnm Skilii ^fmWm Program, Aehtevnrtwnt Trend* in Iowa, 

1&66 1085" Uowa T«sf»ng Profjfarm. unpublished sm, undated materia)!; and A, N, Mioronymus, £. 
Lmdqutst, and H D Hoover, /enva ?Vsfs of flame Shfo.- Manual For School Ammtwtms iChmm 



cohort pattern (see Appendix B), Among children born after 1968 or 
sy—that is, beginning with the cohorts that entered school in the late 
1960s* -each cohort hat tended to eutscore those preceding it. The smaller 
gains in the higher grades thus appears to reflect, at least in substantial 
part, the smaller number of higher-scoring cohorts that have reached senior 
high school. 

This trend can be seen in the Iowa data, which suggest— if trends in 
Iowa are indicative of national trends in this regard** that gains have been 
comparably fast, or even more rapid, in the higher grades than in the lower 
ones \S On the ITBS, each birth cohort since the onset of the score increase 
has tended to produce slightly larger increases in grades six through eight 
than in grades four and five (see Figure IV*3; vertically adjacent lines that 



g. This gofidusiea r§ fleets ehanpt wprtssid ia sUadard deviations and only wnpariseo* 
within a slog It t#§t* Trt nds an the ITED art not compared with those on tht ITBS, 
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Figure IV -4. 

ITED Composite, 
by Birth Year 




1955 U 

SiffN V«ff 



commumcfltiofi, August !M4, r ' 



are para lei indicate comparable gains by the same cohort in different 
grades). In this respect, the upturn in the 1TBS has been largely symmetrical 
with the last years of its downturn, On the ITED, the gains^rodu^^ 
given cohort have remained roughly comparable as that group moved from 
grade 9 through grade 12 (see Figure IV-4). For several cohorts ISrZ 
upturn began, these gains were also basically symmetrical with the corre- 
sponding last years of the decline, but the most recent cohorts to reach the 
Wgh-school years- -those born in 1966 through 1969-have produced gains 
of them^d 1950 corresponding decline produced by the birth cohorts 



Sex 



While the achievement decline was sizable among students of both sexf*s it 

J2#IT!!5 i m T Mver \ amon « fem ^« students in the case of language, 
related tests (such as vocabulary, reading, and the SAT-Verbal). On the 
other hand, once the effects of changes in the composition of the test- 
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taking group are taken into Account, the declines among males appear to 
haVO been comparable or oven slightly larger than those among females in 
mathematics and science, 9/ 

The average SAT acres of women dropped substantially more than 
those of men, This difference by sex was large on the verbal scale — after 
1967, women dropped 60 points, compared with the 36-point drop in the 
average score of males-but far smaller on the mathematical scale* 10/ The 
average score of female ACT candidates also dropped more than that of 
males, and the difference was greater on the English test than in 
mathematics. 11/ 

In both cases, however, the apparently greater decline among women 
might simply be a reflection^at least in part-of the changing mix of male 
and female students taking the tests. Women have constituted a growing 
share of all students taking both the SAT and the ACT, Women constituted 
42,7 percent of SAT candidates in i960, 47,5 percent in 1970! and 51,8 
percent in 1983*12/ Similarly, women constituted 45 percent of ACT 
candidates in 1964 and 64 pe^ent both in 1975 (the year that ACT scores 



9, On the ACT, the greater decline among women was most pronounced in nodal studies. 
In the NAEP f however, the only companion in social studies that showed relatively 
greater trends in one gender than f # ©ther-citiienship questions at age 13**showed 
females gaining relative to malei* Comparable tabulations from other testa art 
unavailable, The sharp decline of women en the ACT social studies test therefore might 
be just a reflection of the compositional changes discussed below* L, A, Munday, 
Declining Admission Test Scores, Research Report #71 {Iowa City: American College 
Testing Program* February 1978); National Assesimgnt of Educational Progress, 
Changes in PoHHml Knowledge and AUitudMS, 1969-76 (Denver; NAEP/Education 
Commission of the States, March 1978.) 

10 College Entrance Examination Board, College-Bound Seniors, 1984 (New York; The 
College Board, 1984), 

11. These patterns reflect changes in ACT scores from 1965 to 1975, the latter being the 
year in which composite ACT scores readied their lowest point. The data from 1985 
to 1969 are slightly inconsistent with the later data because the former include residual 
on*campus testing. The former are taken from Munday, Declining Admission Test Scores; 
the latter are from unpublished ACT tabulations, 

12, Advisory Panel on the Scholastic Aptitude Test Score Decline, On Further Examination 
(New York: College Entrance Examination Board, 1977), p. 16; and College Board, 
College-Bound Seniors* 1994, 
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^ nnol r ° P0, ? t [ nnd » 1983 -^ This growing shore suggests that 
the pool of women taking the tests might hove become relatively lens 
select-a change that would lead to greater score declines among women 
than among men. ** 

Trends in scores on other tests, however, suggest that part of the 

n "f. ° m0ng W u men i6 lnde Pe»d C nt of these compositional 
changes reflecting some other, as yet unidentified, factors. Data from a 
few nationally representative tests.- which are largely free of these 
compositional changes-.also show greater declines among female students 
on ^age-related tests, HI On the other hand, in mathematics and 
science the decline in the scores of male students was typically as large or 
117 iS? * or . eXampl * • comparison of the high-school classes of 1972 
and 1980 found that women showed a greater decline in vocabulary and a 
slightly larger drop in reading, while men showed a larger decline in 
mathematics,^/ Seventeen.year.olds showed a similar pattern in the 
NAEP over a five, to n ne.year span in the 1970s. Women showed a greater 
decline on both the literal comprehension and inferential comprehension 
components of the reading assessments, while men evidenced slightly 
greater declines in mathematics and science. 16/ Although these 
differences by sex were very small, they might have been larger if the 
comparisons had spanned the entire period of the achievement decline 
rather than only a portion of it. 



13 ' SLSfi,^ ™ Am6riMn Coll... Testing Profram , 

14. 



15. 



16. 



r plM couId ihow 8 chaB ^ «* if *Ss 

PoTlJ A w f UtH h ^ st «™» M«r f ,«t E. GoerU, Thomas L. Hilton, and Judith 

f u- n J $S °^i With D,clin * of Tett Scor « °f m i h School Smtort, 197S 

to 1980 (Washington, O.C.! Center for Statistics, U.S. Department of Education, 1988) 

National Awwint of Educational Progress, Thru National Assessment, of Reading 
(Denver: NAEP/ Education Commission of th. States, 1981), Table, A-9, ilO and A 
11. Mathematical Teehmml R tpo rt: Summary Volume (Denver: NAEP/ Education 
Commission of th* State., 1980), Table 4; Thru National Aliment, of sS«r 

SSSffS S OiT£ im ' ? I (DeflVer '' NAEP/ Mtmtim Commission of 4 IK 
men droned ofKi<,n "> of women increased, while those of 
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Achievement Subgroups 

A current and widely held view is that the decline in achievement was more 
severe among relatively high-achieving students than among those at the 
lower end of the achievement distribution. This belief has led some 
observers to credit the educational system with improving its services to 
low-achieving students, or, alternatively, to fault it for allowing its services 
for more able students to deteriorate. 17/ 

It is not clear, however, that trends have been consistently more 
favorable among lower-achieving than among higher-aehieving students over 
the entire period of the achievement decline and subsequent upturn, When a 
wide range of tests is considered, a more complex-and sometimes inconsis- 
tent -pattern emerges, Moreover, there are major gap* in the available 
data-such as the sparseness of relevant comparisons during the first half of 
the achievement decline, and a very limited picture of the relative 
performance of achievement subgroups during the recent year* of increasing 
achievement. In addition, both apparent changes in the gap between 
achievement subgroups and inconsistencies in the data about these groups 
must be taken cautiously because both consistencies and variations in the 
data can be artifacts of technical aspects of the tests. 

As discussed in Chapter II, a number of technical aspects of testa 
influence conclusions about relative trends in high- and low-achieving 
groups. Differences in the scaling of test scores can markedly affect such 
judgments. In addition, a single test is unlikely to be a comparably 
comprehensive measure of mastery at two very different levels of achieve- 
ment and therefore may understate the relative change of students at one 
level. The tabulation and reporting of results further complicates compari- 
sons, since information on the additional items correctly or incorrectly 
answered is rarely reported, particularly for achievement subgroups. This 
lack of information makes it hard to judge whether changes in the average 
scores of achievement subgroups are substantively comparable, even when 
they seem similar numerically. Nonetheless, the broad range of tests 
suggests the following generalizations. (See Appendix D for additional 
details.) 



17. See, for example, statement by Archie E. Lapointe, Executive Director, National 
Asieismint of Educational Pregrew, before the Subcommittee on Elementary, 
Secondary, and Vocational Education, Committee on Education and Labor, January 
31, 1984- and William W. Turnbull, Changat in SAT Scent: What Do Thty Teach Utt 
(report to the College Board-ITS Joint Staff Research and Development Committee, 
forthcoming). 
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It is dear that the achievement decline and the subsequent upturn 
appeared among both low- and high-achieVJng students. Whether the decline 
began at the some t JmC in different achievement subgroups, however/and 
whether the drop was comparable among those subgroups during the early 

fnkll T f Wi (thG 19808 Hnd the flrSt of the >»W domain 

unknown Tabulations comparing achievement subgroups during those years 
are largely restricted to unrepresentative groups of students-. for example 

that^t 80 " 8 S taWng thG SAT> ClaMifl6d in term8 of their rank °» 

During the mid- and late 1970s.-that is, during the end of the 
achievement decline and the beginning of the subsequent upturn-students 
In the top achievement quartile (the top fourth of all students, when ranked 
by achievement) lost ground relative to those in the bottom quartile In 
roading mathematics, and science in the National Assessment of Educa- 

f ?5 l*Avn' £ I*?? f P ^^ ed in a " thrce •»» fi rou P» te ^d (ages 
I! . : f t hi h ° X l^ lt t00k difrercnt forms at ages-.probably 
as a result of the cohort pattern shown by the end of the decline At aw 
nine, gains predominated over losses, but the lowest quartile showed larger 
gams than did the highest. At age 17, declines predominated, with the 
larger losses generally appearing in the highest quartile. At age 13 gains 

reJative S pin? re ^ bUt ^ l0W68t qU8rtile Stl11 sh « wed 

While the narrowing of the gap between the top and bottom achieve- 
ment quarUles on the NAEP is clear-cut, other data cast doubt on the 
extent to which this was a general trend over the past two decades, Similar 
trends appear m some data (such as the Illinois Decade Study and some 

^AT a i8/ n8 M h6 SAT)> ! Ut n0t ° n 0thMS (8Uch as othw tabulations of the 
SAT).A2' Moreover, under most circumstances, a narrowinf of the gap 
between the top and bottom quartiles would cause the standard deviation of 
test scores- -that is, their variability- to decrease. That has not been the 
SSSf ? however in the few data sources for which historical 

records of standard deviations are available. Since the early 1970s the 

slightly. The SD of the ITBS has been increasing, while that of the SRA 
achievement series has shown mixed trends (generally inconsistent with 

The Illinois Decade Study li a compariwn of the performance of Illinois hifh school 
AwenL 1 ! bBttery ° f " hi — 1 *«* i« the 1970 and SS Soil 



Chspter IV GROUP UltTPKttKNCfCS IN TRENDS 73 



the NAEP pattern in the earlier grades, but consistent in the higher 
grades)* 19/ 

Several explanations of this Inconsistency are plausible, Some of the 
variation among tests could simply be an artifact of scaling differences, 
For example, the Illinois Decade Study is consistent with the NAEP in its 
published form, which presents simple differences in scores, but is inconsis- 
tent when presented in terms of proportional changes in scores. Differences 
in the way students are classified as high* and low-achieving could also 
account for much of the variability. For example, classifying students in 
terms of their self-reported class rank yields patterns on the SAT since 1976 
that are consistent with the NAEP (even though the standard deviation of 
the SAT was increasing at that time), while classifying students in terms of 
their rank on the SAT itself yielded trends that are inconsistent with the 
NAEP, On the other hand, some of the inconsistency might reflect true 
variation among tests; perhaps the lowest quartik gained relative to the 
high^t only on certain types of tests. 

Test scores of students taking college admissions tests—currently, 
about half of all high school graduates-declined more than those of high 
school seniors in general But this difference primarily reflects the 
changing composition of the group taking those tests rather than a greater 
decline in achievement among high-achieving students. The proportion of 
students taking the SAT, for example, grew substantially during the 1980s 
and early 1970s, and this growth was accompanied by an increase in the 
share of SAT candidates from historically lower^achieving group| such as 
certain ethnic groups and families of lower socio-economic status*2W Since 
the early 1970s, however, such changes in the composition of the test taking 
group have been relatively minor* 21/ 

The highest-achieving students—those scoring highest on tests, taking 
the most advanced courses, and so on**evidenced both the decline and the 
subsequent upturn in achievement These students did not show a consis* 
tently greater decline than the average student Indeed, by some measures, 



19, The College Beard, College-Bound Seniort, various years; American College Testing 
Program, unpublished tabulations; H*D, Hoovtr , personal communication^ March 1984; 
and Science Research Associates, &RA Achkwmeni Series, Technical RipoH #3, Table 2. 

20, Advisory Panel on the Scholastic Aptitude Test Score Decline, On Further Examination, 

21, Because compositional changes exacerbated the decline in the SAT but not the 
subsequent upturn, comparing the SAT upturn to the previous decline is misleading, 
Hie relative sise of the upturn is understated unless adjustments are made to compensate 
for the compositional changes. 
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they appear to have gained recently relative to the average, particularly 
in the area of mathematics. For example, the proportion of SAT candidates 
scoring over 700 on the mathematics tost has risen sharply in the last few 
years (from 2,7 percent in 1980 to 3,6 percent in 1984) and is now quite 
close to the level of 1966- -the highest level in any year for which 
tabulations are available. Similarly, American seniors taking calculus and 
pro-calculus- -together about 10 percent to 12 percent of all sen- 
lors.-showed gains between 1964 and 1981 in international assessments of 
mathematics achievement, 'Ihe skctchiness and inconsistency of data on the 
highest-achieving students, however, cloud these conclusions. 



RACE AND ETHNICITY 



Recent years have seen a shrinking of the long-standing difference between 
the scores of block and nonminority students on a variety of achievement 
tests. The evidence pertaining to other ethnic groups is more limited but 
there are suggestions of relative gains by Hispanic students as well W 
While the change has been small relative to the remaining gap between the 
minority and nonminority students, it has been consistent from year to 



The term "ethnicity" as used in the following discussion encompasses some distinctions 
- -such as that between blaeka and whites-that are often popularly termed racial This 
convenHon is followed in part for simplicity, but also because some of the most common 
current categories have at best ambiguous racial bat as. For example, many South Asians 
are often classified as nonwhite (as in some Census tabulations), even though most South 
Asians are m fact racially Caucasian. Similarly, people of mixed black/white origin 
are frequently classified aa black without regard to whether the greater proportion 
of tfttir ancestry is i„ f act white or black, Hispanic* are almost all classified m whites 
in Census tabulations, even though many of them are racially mixed. (In particular 
many are partially or primarily native American in origin, and native Americans are 
racially classified aa "Mongoloid", .that Is, Asian, .people,) 

The ethnic categories used in this paper necessarily reflect the disparate conventions 
used in the data sources cited and therefore vary among tests. In general, the term 
nonminority excludes, to the extent possible, all minority groups identified in each 
data source and usually .corresponds to the category labeled "white" in the cited sources 
The data sources vary considerably, however, in terms of how many • • and which - - ktoubs 
are specifically identified. Moreover, some individuals, -such as black Hispanics- -can 
be classified m more than one way, and there is typically little information available 
about how those ambiguous cases are handled. 

The more important known variations in the classifications used in the various sources 
are noted in Appendix E. 
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year and could prove substantial over the long run* These patterns are 
summarized below and are discussed in more detail in Appendix E, 

Trend data on the scores of different ethnic groups are very limited, 
however, and generally extend back only a relatively short length of time, 
In addition, since many ethnic-group differences in achievement are largo, 
the ambiguity inherent in measuring changes in the gaps between achieve- 
ment subgroups described above applies to these comparisons as welL In 
this case, however, the pattern of the trends leaves no doubt that the 
closing of the gap is at least in part real and not an artifact of the ti&tsM 1 
Finally, classification of students* ethnicity is likely to be prone to error, 
both because of the unreliability of students* self-reports and because of the 
ambiguity-and lack of consistency over . time-of ethnic classifications. 
While this is unlikely to be a serious source of bias in interpreting trends 
among black students, it is cause for caution in considering data about 
Hispanics. 24/ 

Black Students , In general, it appears that the average scores of black 
students: 

o Declined less than those of nonminority students during the later 
years of the general decline; 

o Stopped declining, or began increasing again, earlier; and 

o Rose at a faster rate after the general upturn in achievement 
began* 



23, This narrowing of the gap is substantiated by several factors. First, the pattern is 
consistent among a variety of very different tests, Second* during certain periods, the 
convergence reflected gains among blacks concurrent with decline* among nonminority 
students, Unlike differences la relative gains (or declines) between groups, a pattern 
of gains in one group and declines in the other is unlikely to be an artifact of the scaling 
method used and will generally persist even if the data are rescaled. Third, biases caused 
by celling effects have been largely ruled out. In the case of tests scored as the percent 
of questions answered correctly, the scores of the higher*aehieving group can be held 
down by a ceiling effect, creating an illusion that iower^achieving groups are gaining 
in comparison. To lessen the likelihood of such a distortion, date of that sort were 
transfownd (by a legit transformation; to eliminate ceilings, and the narrowing of the 
gap remained, 

24 See, for example, "Problems in Defining Ethnic Origin" Appendix A in Congressional 
Research Service, Hispanic Childrm in Poverty (Washington, B.C: CRS, September 
13,1985), 
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The relative gams of black students appear on a variety of tests 
administered to students of different agfi8 in different localities, They 
appear at ages 0, 13, and 17 in the National Assessment of Educational 

of high school seniors in 1971 and 1979; in grades 3, 8, and 0 in the North 
Carolina state aisessment program; among ninth graders in the Texas state 
assessment program; and in test data from some local education agencies 
such as Cleveland, Houston, and Montgomery County (Maryland), 25/ 

minJE? SAT dttta i ; u J»f 8t thot P ar t of the convergence of black and non- 
minority scores resulted from the decline ending earlier among black than 

?h?"f ITfS^ StUd f ntS ' The conver een Ce of scores continued during 

raLl/fS g6nera UPtU I n> h ° WeVOr ' aS black Student8 S fli « ed "ore 

rapidly than did nonminonty students, 

Although this shrinking of the gap has been small relative to tho 
average differences between black and nonminority students, the rate of 

StwSn M - ST" f Preciable : For exam P Je » over the past nine years, the gap 
between black and nonminority students on the SAT has shrunk at an annual 
rate roughly comparable to the average rate of the total SAT decline- -a 
change that few people would label insignificant, On the National 
Assessment, the average black student's mathematics score was a third 
below the nonmmonty average in 1972 but a fourth below that in 1981 



25. 



S NAEP S T^Xr M ^ mt ^ °^ ea4in '' NA EP. Th» fading Report 
wira, NAEP, The Third National Mathematics Atitumtnt; and NAEP Mathematical 

on SAT, Otter Measure, of Educational Achievement (New York: The Coll.s. Bo.X 
Tab es D-l, D-2, and D-3; Nancy W, Burton and Lyle V. Jonas, "Recant Trends in 
tSSTS S*» M B, f " i "F* Y ° Uth '" **™"U>nal Rmarc?" vol. Vl (April 
Teal ? l2u£ t ft,Sri!rf' y ^ OUOty ( ^ aryllnd) Pub » c School District, "MCPS 
mh^wtu L^ i mih0U . G T pS ' ""-1982," unpublished paper; Marian 
SS^SiSSS ICat,0a ' MarCh " d ™ent Schoo, 

i'ttf han i i ? PW " ^ ACT Me oot ««W«y consistent with this pattern 

ISJ'i rt y i ghUy ' " d ** trwd hM beea hi « hl y year to year In 

addition, thttwnd vsriu«fliMi f iuW^;th« gap narrowed in swlal studies, for i«mple 

hfjri t ln m **? m ^- ™» P«tial inconsistency with the P S?L 3nt 
In other tests is discussed further in Appendix E, "«»fiwm 
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Figure IV-5. 

Trends in Average Reading Proficiency for White. Black, and 
Hispanic Students, by Birth Year 
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SOURCE; National Aiieiiment of Educational Pragma. The Heethnp Report Cmd (Princeton: N AEP/ Educational 
Tasting Sorvico, IMS), Data Appondii! 



It is likely, but not certain, that this narrowing of the gap will 
continue to appear in some test data for several years. The NAEP data 
show the most rapid convergence accompanying the birth cohorts of the 
mid-1960s as they pass through school- -appearing at age 0 in the early 
1970s and at age 17 in the early 1980s, Some narrowing, however, appeared 
at least as late as the birth cohorts of the late 1960s and perhaps as late as 
those of the early 1970s, 26/ This pattern would suggest further converg- 
ence between black and nonminority scores on high school tests for several 
years, On the other hand, the SAT is inconsistent with this pattern; the 
relative r ains of black students on that test ended in 1981 and 1982, 



28, National Assessment of Educational Progrtss, The Reading Report Card, Figure 3,2. 
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Despite these changes, the gap between the overage scores of block 
and nonminority students remains striking, On the SAT, for example the 

uSZ££ 8tUdentS SC0r6 . ,n lm C0 " CS P°" d ^ «ughi to the nth' and 
18th percentiles among nonminority students on the mathematics and verbal 
scale J «»P*ctive y. In 1984, the average black scores had risen to about 
the 18th percentile among nonminority scores on both scales,^ While some 
other tests show smaller overage differences than the SAT, the cap 
nonetheless remains large by virtually any measure. 

f^panic Students . In national samples, Hispanic students on average show 

somewhat if' I™ ^ of flchi f ement *han nonminority students "though 
somewhat higher achievement than blacks. In recent years, the averse 
ach evamant of Hispanic students, like that of blacks, has risen relativ™ to 
that of nonminority students. ™ 

howev^f nrf !fi!k, n ? f b ° Ut tre « d » "mong Hispanic students, 

however are subject to important qualifications. First, the relevant data 

? HisZ£. hfl " m tHe CaSC ° f bkck8> More ^portent, the term 

H spanic subsumes many groups difTering in culture of origin, lengtl ' 

1*^1^ StSt f' relatlVCly nMm y in and U8e of English 
UZ trJf S' 10 " th8t P resumabl y ^ect educational performance. 

Snds haf S H «P«»c students as a whole provide only suggestions of 
te^ft. n? g -r be / CCi ;f rm f m »<>« 8P«ific groups that are often the 
targets of specific educational programs-such as children with limited 

proficiency in English, or the children of migrant farm workers, 

^ With thos- qualifications in mind, the relative improvement of His- 
panic, achievement is apparent in the NAEP reading and mathen.-tics 
assessments (see FieuralV-S) in th» qat i„ ™ B " " «i"w»wfc. tics 
*u j 8 : , ; » tfte bAr » m the Texas state-wide assess- 

S^samo ^ \ t^f' 8 " d in . B com P ari30 « of nationally representa- 
tive samples of high school seniors in 1971 and 1979 (the NLS and HSB 
comparison).^ This trend appears not to be limited to one Hispanic group 

55?-* mm !u aP S % am0ng both M o«can.American and Puerto Mean 
studen s on the SAT and among both Mexican Americans and «o her 

f»^SJ n 1 NL ? aH 1 ? SB com P ari80n . although the improvement 
among Mexican Americans is in several instances greater.!!/ The annual 



27. 



SS'lSf ™JL aW l S f ed °\ ? oam,norit y withln-group standard deviations in 
Yofk • fif cSS? k? SOl0 T° A f bBiUr ProA/« of ColUg^und Seniors, 1384 (New 
xork. rh« Coiltga Eatranet ExftmiBBtion Board, 1984), p, 81. 

28. Rock and other., Factors Associated £ ec «ne of Ttsf Scores, Appendix D In this 
55££%SlL 10 * tWdS Sh ° W ° ^ HUpanic^nd nonJL^ 

29 ' I^* 086 * H t S , panie » ub PowP^ «» *he NLS and HSB comparison, however 



group iwmmmm w nuwos ?p 



SAT dnW suggest that k among Hkpnt)\Q8»*m nmong hlockf^thc* 
achievement decline ended &a few yenrs cmrilor thnn it did among 
nonmmority studonts, 



DIFFERENCBIN TRENDS 

AMONG TYPES OF SCHOOLS AND COMMUNITIES 



While the achievement declin* e was pervaslvo f it has not been entirely 
uniform among different typ^&a of communities and schools. This section 
discusses the relative tret\d& in three specific types of schools and 
communities about which daw /tt^re available: 

o Disadvantaged urb^H-a communities; 

o Spools with difToro/it^ concentrations of ethnic minorities* and 
o Private school^ 



Disad van tmd Urban Commus il^igs 

Since 1970 * P^and IS^yearvoBTIrii in disadvantaged rban communities 
gained ground relative to the nutation as a whole on th* NAEP mathematics 
and reading assessments (m§ arables IV-1 and IV*2).30/ In contrast, 17* 
year-olds in disadvantaged urt^n communities showed no relative gains In 
mathematics! and their sma[J relative gains in reading occurred entirely 
between 1975 and 1983, In frtwo instances-in reading at age 0, and in 
mathematics at age XS^mof^ fethan a third of the gap between disadvan* 
taged-urban communities and U3he nation as a whole was overcome since the 
early 1970s- 31/ 



30, For a school to be defined a* "disadvantaged urban/ 1 it had to be located within either 
the city limits or the urban fr*a»ge of a city of at least 200,000 people (or twin or triplet 
0iti#t withcombiatd populatitf&ftpS ever 200,000); and it had to serve a community that 
had unusually few managerial] and professional personnel and atyplcally many 
unemployed adults and adu: % welfare, The latter criterion was implemented through 
four s0ps; asking tht principal 1 of each school to estimate the proportion of students 
whote partiU fell into thoir tt^stegories; summing the percentages on welfare and 
unemployed; subtracting thi ps^^centeg e professional or managerial; and selecting the 
lehooli that constituted tht Up UHO percent on the resulting inim, (Westnt Corporation, 
unpubhdhd NAEP docu m*nta$!(ton), 

3L In the mil of mathematics, Ji^swever, the amount by which the gap closed can be 
coniidifed only approximate tear the 1972 averag e scores are only estimates. See 
footnote ijablt IV* 1, 
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TABLE IV-l, AVERAGE MATHEMATICS ACHIEVEMENT IN 
DISADVANTAGED URBAN COMMUNITIES AND 
IN THE NATION, NAEP, 1972- 1981 
(Average percent of items correctly answered) 



Nation 

Disadvantaged Urban 
Nation Minus 
Disadvantaged Urban 



Nation 

Disadvantaged Urban 
Nation Minus 
Disadvantaged Urban 



Nation 

Disadvantaged Urban 
Nation Minus 
Disadvantaged Urban 



Percent 

1972 Change 
(Estimated) a/ 1977 1981 1972-1981 



Age 9 



66.7 


65.4 


66.4 


-1 


41.9 


44.4 


46.6 


9 


14.8 


11.0 


10.9 


-26 


Age 13 








B8.6 


66.6 


60.5 


3 


41,5 


48.6 


49.3 


19 


17.1 


13.1 


11,2 


-35 


Age 17 








64.0 


60.4 


60.2 


-6 


51,6 


46.8 


47,7 


•7 


12.5 


14.6 


12.5 


0 



SOURCES: GEO calculations based on National Assessment of Educational Progress, 
The Third National Mathematics Assessment: Semite, Trends, and Issues 
(1983), Tabloid! and §,2; and Mathematics Technical Report: Summary 
Volume (1980), Tables 2 f 3 f and 4. 

a, These estimates for 1172 differ from published NAEP results for the 1972 assessmink 
The published results for that year are based either on the 1972 item pool or ea the items 
used in both 1972 and 1977, while the trend results comparing toe 1977 and 1981 
assessments reflect items used in both the 1977 and 1981 assessments, la order to 
circumvent the large disparities in the item sets. 1972 results were estimated here by 
adjusting thg 1977 results (on the items used in 1977 and 1981) by the 1972-to-19?7 
change (on the items us<td in 1972 and 1977), 
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TABLE IV* 2* AVERAGE READING ACHIEVEMENT IN 

DISADVANTAGED URBAN COMMUNITIES AND 
IN THE NATION, NAEP, 1970- 1983 
(Average proficiency scores) 



Percent 



Group 


1970 1974 


1979 


1983 


Change 
1970-1983 




Age 9 








n anon 


Mil HU 




213 


3 


Disadvantaged Urban 


178 185 


186 


194 


9 


Nation Minus 










Disadvantaged Urban 


29 25 


28 


19 


■34 




Age 18 








Nation 


254 255 


257 


258 


2 


Disadvantaged Urban 


232 229 


242 


240 


3 


Nation Minus 










Disadvantaged Urban 


22 26 


16 


18 


-18 




Age 17 








Nation 


284 285 


285 


288 


1 


Disadvantaged Urban 


259 261 


258 


266 


3 


Nation Minus 










Disadvantaged Urban 


25 24 


26 


22 


-12 



SOURCES: National Assisimtnt ©f Edueatieaal Progrtss t The Reading Report Card, 
Data Appendix, 



NOTE* Details might not add to totals because of rounding. 
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Schools With High or Low Concentrations of Minority Students 

Although information on the relative trends in high- and low*minority 
schools is limited, such data as are available suggest that, relative to the 
nation as a whole, high^minority schools have gained in achievement while 
low-minority schools have lost ground; While the available analyses of these 
data do not clarify whether the gains of minority students have been larger 
or smaller in high-minority schools, they do indicate that the relative gains 
of minority students as a group cannot be attributed entirely to improved 
performance of those attending low-minority schools, At all ages, 
mathematics gains between the last two National Assessments (1977 and 
1981) were several times as large in schools that had minority enrollments 
of at least 40 percent than in other schools (see Table IV -3). Similar^ in a 
comparison of the HSB and NLS test results, seniors in low*minority schools 
defined as at least 90 percent nonminority^-showedj on average, larger 
declines from 1972 to 1980 than did seniors in other schools. In the case of 
vocabulary, the decline in low-minority schools was 83 percent larger than 
in other schools; The difference was about half that siae in mathematics, 
and a fourth in reading. 32/ 



Private Schools 

The achievement decline occurred among high school students in private as 
well as public schools. Moreover, it appears to have been nearly as laige 
among private school students in reading and vocabulary^ although somewhat 
smaller in mathematics (if tests of reading, vocabulary, and mathematics 
administered to seniors during the last half of the decline are an adequate 
indication), 38/ Beyond that, very little can be said about the relative 
trends among private school students, because of the extremely sparse data. 
For example, whether the upturn in achievement found in public school and 
nationally representative data-*the latter of which is dominated by the far 
more numerous public school students-occurred in private schools as well is 
not yet known. 



32, Roek and otht rs 8 Factors Associated with Decline of Test Scores, AppaadiK D, 

33. Ibid, 
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TABLE IV-3, AVERAGE MATHEMATICS ACHIEVEMENT IN 

HIGH - MINORITY AND LOW-MINORITY SCHOOLS, 
NAEP> 1977 AND 1981 (Average percent 
of items correctly answered) 



Percent 
Change 



Group 


1977 


1981 

_ _ - - - — 


1981-1977 




Age 9 






Nation 


65.4 


66,4 


1.8 


40 Percent or More 








Minority 


46.4 


48,8 


6.2 


Less than 40 Percent 








Minority 


57,6 


68,6 


4 ft 

1.7 




Age 18 






Nation 


56.6 


60,5 


6.9 


40 Percent or More 








Minority 


46,5 


53,6 


17.8 


Less than 40 Percent 








Minority 


59 6 


62.4 


4,7 




Age 17 






Nation 


60.4 


60,2 


-0.8 


40 Percent or More 








Minority 


47,5 


52,3 


10,1 


Less than 40 Percent 








Minority 


62,4 


62,4 


0,0 



SOURCE; National Asseiimiat of Educational Proffiis l The Third National 
Mathematics Assessment: Results, Trends, and Issues (1983), TabI© 5,2, 
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The SAT - decline was &m»s among- mh private- and public school 
students. 34/ Since the selegtefe ihsngt i£m : contributed to the SAT 
decline might have been v«** ^ri»n; aroost private school students, 
however, a comparison of the mm of the «AT etcline in the two groups of 
students would be risky. 



Advisory Pans! on the Scholastic Aptitude Teat Score Diclini, On Further Examination, 
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APPENDIX A 



DESCRIPTION OF MAJOR DATA SOURCES 



This Appendix briefly describes the most important data sources used in 
the text and in other appendixes. These sources are: 

o Two college-admissions tests-the Scholastic Aptitude Test (SAT) 
Mid the American College Testing Program (ACT) tests; 

o The National Assessment of Educational Progress (NAEP); 

o The test data from two nationally representative studies of high 
school students«the National Longitudinal Study of the High 
School Seniors Class of 1972 (NLS) and the High School and 
Beyond study (HSB); and 

o Annual statewide test data from Iowa, 



THE SCHOLASTIC APTITUDE TEST 



The Scholastic Aptitude Test (SAT) f sponsored by the College Board and 
administered by the Educational Testing Service, is intended to aid colleges 
in selecting students for admission, It is perhaps the single best known test 
in the United States and has figured prominently in discussions of achieve- 
ment trends for a decade or more. 

The SAT is taken by a large number of students, but they constitute a 
clearly nonrepresentative group. Students taking the test are predominantly 
those intending to attend college, have higher levels of achievement than 
does the student body as a whole, and are concentrated in certain geograph- 
ic regions. In the 1984-1985 school year, the SAT was taken by nearly one 
million high school students, representing over a third of all graduates and 
about two-thirds of college-bound graduates*!/ Nonetheless, it was the 



L The College Entrance Examination Board, National College-Bound Seniors, 198$ (New 
York! Ilia Collage Board, 1985)* The number of high school graduates in the 1984* 
1985 school year, excluding high school equivalency credentials, has been projected 
to be about 2*6 million, National Center for Education Statistics, Projections of Education 
Statistics to 2990-91 (Washington, D*C: NCES, 1982), Table 15, 
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principal college admissions test in only 22 states, which were primarily in 
the east and on the west coast. 2/ 

The SAT consists of two tests, one mathematical and one verbal Jl^ 
f The verbal test consists of analogies, antonyms, sentence completions, and 
r reading passages* 4/ The mathematics test consists of a variety of problems 
in arithmetic reasoning, algebra, and geometry that are intended to "require 
as background mathematics typically taught in grades one through nine 11 but 
to "depend less on formal knowledge than on ^easonlng, l, 6/ 

The SAT is designed to predict achievement in college, not to directly 
assess achievement in secondary schools, Accordingly, the test has been 
validated primarily by documenting that students scoring higher on the test 
tend to have higher grades in college. 6/ In contrast, tests intended to 
assess students' current levels of mastery are typically validated by showing 
that students scoring higher on the test in question tend to score higher on 
some other measure of current achievement, such as teache^s , evaluations 
or other achievement tests, 7/ 

Although the SAT is designed to be a predictor of college performance 
and was neither intended nor validated as an achievement test, it has often 
been used as an index of achievement- •despite strong objections from the 



2, State Education Statistics* State Performance Outcomes, Resource Inputs, and Population 
Characteristics, 1982 and 2984 (Washington D.C.: US, Department of Education 
January 1985), 

8. A third scale, the 'Test of Standard Written English," was first added on an experimental 
basis in the m!d4970; it is not discus s#d in this paper, 

4, The College Board, College-Bound $ eniors* 

5, Advisory Panel on the Scholastic Aptitude Test Score Decline, On Further Examination 
(New York: The College Board, 1977), p, 9 S 

6, Hunter M. Breland, Population Validity and College Entrance Measures (New York: 
The College Board, 1979), It is well established that high SAT scores am associated 
with higher grades early in college, The extent to which the SAT provides information 
about likely college performance above and beyond that provided by ether indices such 
as high school grades is a matter of some disagreement. That issue, however, is not 
germane to the use of SAT scores in this paper, (See, for eKample, James Grouse, "Does 
the SAT Help Colleges Make Better Selection Deciiions? 11 Harvard Educational Review, 
vol 55, Msy 19S5, pp. 195*219; and George H, Hanford, "Yes, the SAT Does Help 
Colleges" Harvard Educational Review, vol, 55, August 1985, pp, 324-381 J 

7* See, for example, Science Research Associates, BRA Achievement Series, Technical Renort 
#3 (Chicago: SRA, 1981), - 
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College Board. 8/ For example, much of the public debate about declining 
achievement focused at least in part on the SAT, and the annual compilation 
of stete education statistics by the U.S. Department of Education calls the 
test a "performance outcome" (rather than a "predictor of performance"), 9/ 

The SAT is administered several times each year, and the scores 
obtained in each year are equated, so that any given score should reflect 
approximately the same level of skill in any year. Annual publications 
provide detailed tabulations of the scores of the test -taking group as a 
whole and of a variety of subgroups, such as males, females, and ethnic 
groups. Data on student characteristics such as these are mostly based on a 
Student Descriptive Questionnaire (SDQ) completed by students, and the 
information is therefore subject to distortions stemming from both non- 
response and various kinds of reporting errors. 

Data on the SAT extend back longer than those on most other tests, 
but the long-term data used in this paper are subject to several inconsisten- 
cies* Current tabulations by the College Board reflect only the most recent 
test taken by students who also completed the SDQ«about 90 percent of all 
SAT candidates. 10/ Average scores from the 1966-1967 through 1970-1971 
school years are College Board estimates of the averages that would have 
been obtained if such tabulations had been made for those years. Data from 
the 1956 through 1965 school years are based on the average of all scores, 
which includes multiple scores by those taking the SAT more than once, 11/ 
The published data on these averages of all scores were adjusted by 
subtracting from them the slight difference in 1966 between that average 
and the average based on only the most recent of each individual's scores, 
Trend data on the proportion of SAT scores above specific thresholds were 
subject to a similar discontinuity and were similarly adjusted, but in that 
case the adjustment was based on the average discrepancy in averages over 



8. See, for example statement by Daniil B, Taylor* Staler Vie© Fresidont, the College 
Board, befeie the House Subcommittee on Elementary, Secondary, and Vocational 
Education; Committee on Education and Labor, January 31, 1984, 

9. State Education Statistics: State Performance Outcomes, Resource Inputs, and Population 
Characteri$tics, 1982 and 1984 (Washington, D.C,: U.5, Department of Education, 
January 1985), 

10. The College Board, College^Bound Seniors, 1986, p, 4, 

11. Hunter M. Breland, The SAT Score Decline: A Summary of Related Research (Now 
York* The College Board, 1976), Table L 
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Wto^^ thX0Ugh 1974) for wh,ch both averages were 



THE AMERICAN COLLEGE TESTING PROGRAM TESTS 



The American College Testing Program (ACT) tests, like the SAT are 
intended as an aid in selecting students for admission to college The ACT 
teste were taken by about 789,000 high-sehool students in the class of 1984- 
1985--ever a fourth of all graduates. Although the ACT battery is taken by 
fewer students than is the SAT, it is the predominant college-admissions 
test m ^28 states- -primarily in the Midwest, the western mountain states, 
and parts of the Southeast. 13/ * 

ji Although also intended to predict success in post-secondary education, 
the ACT is conceptually distinct from the SAT and is in some senses 
intended to be more of a test of achievement. The ACT is more "curriculum 
based than is the SAT, relying an both reasoning ability and knowledge of 
subject-matter fields. Despite its intentional reliance on subject-matter 
knowledge, however, the ACT contains many "analytical, problem-solving 
exercises and few measures of narrow skills," 14/ 

The ACT battery consists of subject-matter tests in English, math- 
ematies, social studies, and natural science, yielding four suhjeet-ipecifie 
scores as well as a composite score. The English test is a test of usage, 
tappmg skills such as grammar, sentence structure, and paragraph organic 
tion The mathematics test is dominated by questions on arithmetic and 
algebraic reasonmg, geometry, and intermediate algebra, but a fourth of the 
test is devoted to arithmetic and algebraic operations, number concepts and 
advanced topics. The social studies test includes aspects of history 
government, anthropology, sociology, psychology, and economies The 



12. Ibid,, Table 6. 



13. American College Testing Program. National ACT Assessment Results, 1984.1988- 

!2£„fKK (lowa Cl * : ACTi 1985)1 UJ - Depwtment of EduMt ^ 

P«Fa m rLStedf te ^ AsS " mmt (Iow * C "y- American College Teitiag 
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natural sciences test is about evenly divided between chemistry, physics, 
other physical sciences, and biology, 15/ 

The ACT is reported and equated annually. Trend data reflecting 
subgroups of students are available but are less extensive than those 
available for the SAT, 

The long-term ACT trend data used in this paper are subject to one 
inconsistency. Scores from 1969 on are taken from internally consistent 
tabulations published by ACT, 16/ Earlier data are adapted from tabulations 
that differ from the more recent data in including scores from "residual" 
testing of students on college campuses, who have lower average scores than 
those taking the test before college, 17/ These earlier averages were 
adjusted by adding to them the small difference in 1969 between them and 
the averages consistent with later date, 18/ 



THE NATIONAL ASSESSMENT OF EDUCATIONAL PROGRESS 



The National Assessment of Educational Progress (NAEP) is a critical 
indicator of achievement trends, for it alone among current data sources 
provides repeated testing of nearly representative samples of the national 
student population. 

Before the NAEP was begun, available data often provided an indica- 
tion of achievement patterns and trends in smaller areas- -that ls t in 
schools, districts, or occasionally states, But variations in assessment 
methods from one jurisdiction to another precluded using these data as an 
unambiguous indicator of achievement across the entire nation. 

In contrast, the NAEP was designed to be a measure of the perfor- 
mance of the nation's elementary and secondary educational system as a 
whole. It was not intended to duplicate the assessment mechanisms already 
in place. For example, it was intended to assess relatively general levels of 



15, Americas College Testing Program, Content of the Tests, 

16, For example, American College Testing Program, National Trend Data for Students 
Who Take the ACT Assessment (Iowa City: ACT, undated), 

17, James Maxey, American College Testing Program, personal communication* April 1984, 

18, The unadjusted earlier data jtre in L, A, Munday, Declining Admissions Test Scores 
(Iowa City: American College Testing Program, 1976), Table 3* 
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knowledge, and it was not designed to differentiate among individuals* It 
was t© supplement those other measures by providing a consistent, broad 
f measure of the achievement of a largely representative sample of the 
nation's youth that would be periodically repeated. 19/ 

Since 1969, the NAEP has provided periodic testing of 9«, 13-, and 17* 
year-old students in 10 subject areas. The intervals between assessments in 
any subject area typically have ranged from three to five years. The best 
known assessments we in the areas of reading, writing, mathematics, 
science, and social studies. 20/ 

Although the NAEP is nearly representative of students nationwide, it 
excludes several important groups. In most instances, the NAEP has tested 
only those individual j still in school, 21/ In the case of 17-ytar^olds, this 
practice leads to results that are probably quite different from those that 
would be obtained if all 17*year-olds were tested, since dropouts are 
numerous in that age group and tend to be low achievers, The overall 
average score is thus higher than it would be, and comparisons between 
groups (ethnic groups, regions, and so on) reflect differences in dropout 
rates as well as achievement differences in the entire age cohort, In 
addition, handicapped students and those with limited proficiency in English 
are excluded from testing, although the definition of those categories can 
vary somewhat from one participating school to another. Both of these 
exclusions are germane to the assessment of trends, since the period over 
which the NAEP has been conducted saw the passage of the Education of the 
Handicapped Act (which most likely increased the number of handicapped 
students in regular school programs markedly) and rapid immigration from 
Latin America and Asia. Finally, participating schools have some discretion 
to exclude other students who cannot be assessed properly, 22/ 



19, Director's Report to the Congress on the National Assessment of Educational Progress 
(Washington, D,G,; National Institute of Education, December 1982). 

20, At various times, the National Assessment has included tests of other groups and subjects 
that are not considered here. 

21. Brief descriptions of the NAEP sampling procedure are provided in a number of 
publications. See, for example. National Assessment of Educational Progress, 
Mathematical Technical Export: Summary Volume (Denver: NAEP/Education 
Commission of the States, 1980), Chapter 1. 

22. Lawrence Rudner, Office of Educational Research and Improvement, U.S. Department 
of Education, personal communication, December 1985, 



108 




Append!! A 



DKSCIUPHON OF MAJOR DATA SOURCES 03 



The NAEP tests are designed to assess a range of skills varying in 
difficulty, in mathematics) for example, the easiest items tap recall of 
factual information and simple arithmetic computation, More difficult 
items require an ability to manipulate algebraic expressions, to comprehend 
and explain mathematical relationships, and to apply rkills in solving 
problems, 23/ 

For the purposes of this paper, the principal advantages of the NAEP 
are its nearly representative sampling, its diversity of subject areas and 
levels of skills, and a considerable amount of background information, A 
variety of characteristics of students, schools, and communities were 
ascertained through student, teacher, and school questionnaires, These data 
permit comparisons of trends, for example, among ethnic groups, geographic 
regions, and schools with high and low minority enrollments, 

These advantages are mitigated, not only by the time intervals 
between assessments, but also by the forms in which data were presented 
and the lack of formal equating of scores from one assessment to another. 
Until recently, scores were generally only reported as the percentage of 
items answered correctly«a scaling that has some intuitive appeal but one 
that poses serious problems in gauging trends and, especially, in comparing 
trends among groups, 24/ In addition, information on the standard deviation 
of average scores was often not reported or retained, limiting the extent to 
which the severity of trends could be quantified and compared with that on 
other tests. Beginning with the most recent assessment of reading, tlfrest 
problems have in large part been solved, but most of the trend data remain 
in the original form. Scores were also not formally equated until recently, 
posing problems in the interpretation of trends that were compounded by 
periodic alteration of the content of the tests* A frequent, but not fully 
adequate, response to this problem in the published NAEP data was to base 
comparisons only on items shared by adjacent assessments, 



THE NATIONAL LONGITUDINAL SURVEY 
AND HIGH SCHOOL AND BEYOND 



Two nationally representative longitudinal studies of high school students- 
the National Longitudinal Study of the High School Seniors Glass of 1072 



23. See, for example, National Assessment of Educational Progreai, Changes in 
Mathematical .Achievement, 1973*7$ (Denver: NAEP/Education Commissioo of the 
States, 1979). 

24, See Chapter IL 
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(NLS) and the High School and Beyond study (HSB)-provido comparative 
information on the nchievjrrnent of seniors in tho 1971 and 1979 school 
years, 23/ 

Both studies included era variety of cognitive tests, of which three that 
were administered in both years-vocabulary, reading, and mathematics- 
can be considered measures of achievement 26/ The reading ond vocab- 
ulary tests were identical 111 the two studies; in mathematics, about half of 
the items were identical, ifcaurth were altered in relatively minor respects, 
and the remainder were new, 

In one recent study, thne scores on the NLS and HSB tests in those 
three subject areas were oftauated, providing on indication of changes in 
performance over the eight y«arg, 27/ All comparisons of the NLS and HSB 
in this paper are drawn frninllu&at study. 

Information is available* in the NLS and HSB about a considerable 
number of important mkmai f school, and community variables, making 
possible both comparisons of achievement changes in different groups and 
estimation of the effects of population changes (such as trends in the ethnic 
composition of the schools-re population) on average test scores, This 
information is derived frotim school records, school questionnaires, and 
teacher questionnaires, as weOl as from student self-reports, which increases 
the validity of some of the information compared with that obtained solely 
through student questionnaires. Moreover, in some instances, it permits 
information from one source t*o be confirmed by comparing it with that from 
another* 

The usefulness of the MLS and HSB for analyzing achievement trends 
is limited by several factory however, The absence of earliar, comparable 



25, The NLS and HSB testa m» administered in the springs of 1972 and 1980, and most 
discussions of them refer to tCSiose calendar years* In order to be consistent with the 
treatment of other tests, how^w, this paper refers instead to the school years in which 
the tests were administered 

26, Other tests tapped basic iDf^nitive skills but could not be considered measures of 
achievement, For esampls,! rsnosalc comparisons test was included in 1972 as an index 
of "perceptual speed and itctLwacy." For a brief description of the two im% batteries, 
see Donald A, Rock, Ruth Bife^strem, Margaret E, Goertz, Thomas L, Hilton, and Judith 
Pollack, Factors Associate ymdth Deelinr Test Scorn of High School Seniors, 1972 
to 1980 (Washington, D.C.: Ce»nter for Statistics, U.S. Department of Education, 1985} 
Chapter II. 

27, Donald Rock and others, Fmksgs Associate d with Decline of Ted Scores, 
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assessments precludes drawing conclusions about the decline as a whole or 
placing the changes over the eight-year span into the context of longer-term 
trends in achievement The time interval between the two studies includes, 
if other teste are an indication, a short period of rising scores as well as a 
longer period of declining scores, This mixture could distort assessments of 
the nature of the decline (particularly if the upturn does not parallel the 
decline in all respects) and could bias assessments of the impact of 
population changes on average scores. 

IOWA TESTING PROGRAMS 



Although many states have statewide testing programs, the data from the 
Iowa Testing Programs are uniquely valuable for the assessment of achieve- 
ment trends. Unlike any other data source, it provides annually equated 

data extending over thrse decades for most grpde levels in a variety of 
subject areas, 

The Iowa data represent about 96 percent of public and private schools 
in the state, 28/ Unlike most statewide achievement data, the Iowa data do 
not reflect a mandatory, state*run program, Rather, they reflect voluntary 
participation by school districts in two testing programs administered by the 
University of Iowa. In grades 8 through 8 f the test used is the Iowa Tests of 
Basic Skills (1TBS); in grades 9 through 12, it is the Iowa Tests of 
Educational Development (ITED), The ITBS is the same version as is 
administered in a large number of districts nationwide, while the ITED used 
in Iowa was a longer test than the version used elsewhere in the nation from 
the early 1970s until the most recent version, 29/ In both .cases, the Iowa 
results are compared in this paper with statewide rather than national 
norms. 

Both the ITBS and ITED tap a wide range of subject areas. The ITBS 
comprises 13 subtests in the areas of reading, vocabulary, language skills, 
mathematics, and work study skills, Trend data are available for all 13 
subtests, but in most Instances, only trends in a single composite score are 
reported in this paper. The ITED comprises seven tests: social studies, 
quantitative thinking, natural sciences, the interpretation of literary mater- 



28. "Iowa Basic Skills Testing Program, Achievement Trends in Iowa: 19554985" (Iowa 
Tilting Programs, unpublished and undated, 1985), 

29, Robert Forsyth, Iowa Testing Programs, personal communication, March 1984, 
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ials t genor&l vocabulary, correctness of expression (English usage), and 
sources of information (reference skills, knowledge of information sources, 
and so on), 

The ITED is atypical of elementary and secondary standardize d tests 
in that it includes no separate reading test. Instead, reading ability is 
assessed im the contest of the substantive-area tests, Only in the last few 
years haw the reading items from the various substantive-area tests been 
combined to provide a separate "reading total" score, Therefore, the 
"interpretation of literary materials" test, which taps many of the skills 
commonly included in reading tests, is used as a surrogate for a reading test 
in tUs pap^f, even though it is not a complete measure of the reading skills 
assessed by the ITED, 30/ 

The ITED is intentionally less closely tied to curricula than are some 
other jtandardiEid teats, although mastery of commonly taught materials is 
certainly necessary for success on it. The test aims to assess the 
intellectual skills that students will use in later life and those that represent 
the "long-run goals" of secondary schools, 31/ This intent is reflected, for 
example, In a very heavy emphasis on applications in the ITED quantitative 
thinking test, 32/ 

One tti&jor advantage of the Iowa data for assessing achievement 
trends is the length of the time span covered. Only the SAT provides data 
for a comparably long period. The Iowa data, however, have several 
additional advantages that the SAT does not share, The presence of data for 
10 grade levels permits a clear assessment of the relationships between age 
and achievement trends and provides the single clearest test of the cohort 
pattern sh&yrn by the recent upturn in scores. The Iowa data also avoid two 
of the m^jor problems of nenrepresentativeness inherent in college-admis* 



30. For a summary of the content of the ITED tests, see Iowa Testi of Educational 
Divehpmenii Formg K*7 and Y*7: Manual for Teacher^ CQun$ilor§, and Examiners (Iowa 
City: Iowa Testing Programs, 1078), 

31. Iowa Testing Program, Jr£D Manual forTeachgrB t CounB€lQrs t andExaminer$, 

32. Some of those working with the Iowa data believe that the much greater decline in 
mathematics scores shows by the grade-eight ITBS in comparison with the grade-nine 
ITED flaight reflect ih% fact that the ITED devotes more of its questions to applications 
and Jess to eurricuIuB^based concept items than does the ITBS (Robert Forsyth, Iowa 
Testing programs, personal communication, 1985), 
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sions test data: the Iowa data include students at all achievement levels 
and with all levels of educational aspirations. In addition, the Iowa tests, 
unlike college-admissions tests, are intended and designed to assess 
achievement rather than to predict subsequent college performance. 

Nonetheless, the Iowa data have several important weaknesses for 
present purposes* Most important is the fact that Iowa is clearly not 
representative of the nation as a whole. For example, Iowa students on 
average score substantially above the national mean 33/, Moreover, minor- 
ity students constitute a far smaller share of enrollments in Iowa than in the 
nation as a whole, 34/ Another limitation is that the available tabulations 
of the Iowa data include little information about the performance of 
important subgroups of students* 



33. H* D, Hoover, Iowa Testing Programs, personal communication; Robert Forsyth, Iowa 
Testing Programs, note to school administrators (Iowa City: Iowa Testing Programs, 
unpublished, 1984). 

34, At in the nation as a whole, however, minority enrollment! have been increasing in 
Iowa. In 1972. minority students const! Luted 2,4 percent of enrollments in Iowa and 
21,7 percent in the nation at a whole; in 1980, those proportions had grown to 4,1 percent 
and 26,7 percent, respectively (CBO tabulations of data from the Office of Civil Rights, 
U, 5. Department of Education), 
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E VIDENCE OF A COHORT EFFECT IN THE 
RECENT UPTURN IN ACHIEVEMENT 



Chapter III notes that the end of the achievement decline and the 
subsequent upturn conform more closely to a cohort pattern than to a period 
pattern, This Appendix provides more detailed data indicating the extent to 
which the trends conform to a cohort model It has three sections: 

o The first section explains the criteria that a data series must 
meet to provide a test of the models end identifies the best 
existing data for that purpose; 

o The second section discusses the extent to which each of those 
data series is consistent with both models; and 

© The final section pulls together data from a variety of series to 
provide a composite test of the models. 

This Appendix is limited to the end of the decline and does not assess 
the extent to which the onset of the decline conforms to the period or 
cohort models. The data usable in assessing the characteristics of the onset 
of the decline are even more limited than those relevant to the decline's 
end. Thus, any characterization of the onset of the decline is largely 
speculative, II 



TYPES OF DATA THAT CAN BE USED 

TO ASSESS COHORT AND PERIOD EFFECTS 



Few of the existing data series on elementary and secondary achievement 
provide strong tests of the cohort and period models, To offer a strong 
test, a data series must: 



1. Even some data series that extend back to the mid-1960s give no real indication of the 
timing of the decline's onset Some of .them (such as the social studies and mathematics 
tests in the ACT battery) were already declining at the time of the first available data, 
Moreover, two of the few test series with continuous data extending back into the 1960s 
**the SAT and the ACT- were seriously afFeeted by major compositional changes in 
the test'taking population during the early years of the decline, leaving it unclear when 
they would have begun declining in the absence of compositional changes. 
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o Provide annual or nearly annual scores; 

o Provide appropriate equating of scores, so that scores in one year 
can be considered comparable to those in otner years; 

o Extend over a period spanning at least one change in the direction 
of achievement* trends (that igj one point at which average 
achievement stops increasing or stops decreasing); and 

o Test reasonably comparable groups of students in different 
years, 21 

Further* the best test of the models is provided by data series that also 
provide similar measures of achievement at more than one grade leveL 
Measures that are available only for a single age group-such as the SAT-* 
provide a test of the cohort and period models only by comparing them with 
other tests that reflect different ages. Such a comparison can be biased by 
differences between the tests; the skills tapped by one test might show 
different trends than those tapped by another^ and such a difference might 
be indistinguishable from a difference between cohorts of age groups. Few 
relevant data series* however* provide comparable measures in different age 
groups. 

Within any single data series, the precise beginning of the decline or 
upturn is generally somewhat unclear, and therefore comparing several 
series is important, For example, the annual rate of change in test scores 
during the period around the end of the decline is typically very small, and 
average scores are 'therefore typically quite similar for a period of several 
years. This similarity introduces uncertainty into a choice of any year as 
the low point of the series and often makes it more meaningful to label a 



2, The groups of students tasted in each year need not be identical. Indeed, it is best if they 
are not identical in certain respects. But the confidence one can place in the data is 
lessened if the characteristics of those tested changed substantially more than the 
characteristics of the school-age population as a whole. For example, a sample that 
is entirely representative of the seheoUage population in each year would change over 
time (in terms of characteristics such as ethnicity, family structure, and poverty rates) 
as the school -age population changes, Such a sample would be optimal for testing the 
period and cohort models. On the other hand, compositional changes in die test*taking 
samples that are larger than those affecting the school-age population as a whole-such 
as those affecting the SAT candidate pool in the 1960s-can be sizable enough to mailt 
period and cohort effects. 



115 



fiVIOBNCB OF A COHORT KFFKGT tOi 



period of several years, rather than n single year, as the nadir. 
Comparison of a variety of series helps to lesson this uncertainty. 

Given the criteria above, the following data series provide the 
strongest tests of the period and cohort models: 3/ 

o The Iowa Tests of Basic Skills, luvva state series (1TBS-IA); 

o The Iowa Tests of Educational Development, Iowa state series 
(ITED4A); 

o The American College Testing Program (ACT) college-admissions 
tests; 

o The Scholastic Aptitude Test (SAT); 
o The Virginia state assessment tests; 
o The New York state assessment tests; and 
o The California state assessment tests* 

Two sources provide additional tests of the models, though they are 
weaker because they are not annual One is the National Assessment of 
Educational Progress (NAEP), The second is the periodic renorming data 
from commercial standardized elementary and secondary tests. The latter 
are useful, however, only when publishers have retained data on equating 
studies contrasting the norms derived in each year, 



THE FIT OF THE DATA 

WITH THE COHORT AND PERIOD MODELS 



In this section, the fit of individual data series with the cohort and 
period models is examined. The patterns evident in the Iowa (ITBS and 
ITED) data are used as the point of comparison, since they provide the best 
single test of the models, The section first discusses data series that 
provide strong tests of the models, while those providing weaker tests 
(intermittent data, such as the NAEP) are left until the end of the section, 



3, Additional detail cm the characteristics of some of these data series can be found in 
Appendix A, 
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The Iowa Tests of Basic Skill s, Tn Wa State Series (ITBS-TA ) 

The ITBS Iowa-stato series reflects the scores of nearly all Iowa students 
through grade eight since the mid-lOSOs. In many respects, it is the best 
data on trends m elementary and junior-high achievement, Its advantages 
for the present purposes include: B 

o Equated data extending back to 1954, with annual data from 1984 
to the present; 

° and 118 ** datS ° n achieVement in eaoh grade through grade eight; 

o A general lack of problems with self-selection or other biasing 
selection changes in the student body taking the test. 

The greatest weakness of the ITBS-IA data is the fact that Iowa is in 
several important respects atypical of the United States. By some 
measures, average achievement in the elementary and secondary grades is 
nearly a grade higher in Iowa than in the nation as a whole. 4/ In addition 
^^SySnwiir ,S dem °« raphlC ^ ^ homogenous than the 

Average ITBS-IA scores reached their low points later in higher grades 
han m lower grades (see Figure III-2). Grade five scores bottomed out in 

(i Ji%f*u lK u mB y . m 1BU ' ! mAe seven rou S hl y in 1975 ^ and grade eight 
LfJ t he ,^f nges m avera « e scor es i« grades three and four are so small 
mat it makes little sense to try to isolate a low point. 

The later turnaround in higher grades suggests a cohort model, t J the 
trends in grades five through eight indeed line up more closely when 
displayed in terms of birth years rather than year of testing (see Figure 

«f iQfiq n fVof/^u and J . ei S ht » the lowes t swres reflect the birth cohorts 
oi u»bd and 1964. The nadir occurred In grade six with the cohort of 1963 
while m grade five it coincided roughly with the birth cohort of 1964 



H. D. Hoover, Iowa Testing Programs, personal communication, January 1984, 
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Figure B-1. 

ITBS Composite Scores, Iowa Only (By birth year 
and grade at testing) 

90 , — — _„ 
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SOURCE: CBO calculations based on "Iowa Baste Skills Testing Program, Achievement Trends in Iowa: 1§5§-t985" 
(Iowa Testing Programs, unpublished and undated material), 



The Iowa Tests of Educational Development, Iowa State Series (ITED4A) 

The ITED4A, which includes grades nine through twelve, has the same 
strengths and weaknesses for the present purposes as does the ITBS-IA* 
Given the steeper achievement decline in the higher grades, the low point in 
the ITED is more clearly defined than that in the ITBS. The timing of the 
low points, however, provides less clear-cut evidence in favor of the cohort 
or period model. 

When displayed in terms of test years, the ITED reached its low point 
in 1977 in grades 9 through 11, but not until 1979 in grade 12 (see Figure 
III-3), That is, grades 9 through 11 conform to a period model, while the 
entire span of grades 9 through 12 does not, Accordingly, when displayed in 
terms of birth years, the low points in the different grades do not fully line 
up (see FigureB-2), Grades 10 and 12 reached their low points with the 
1982 birth cohort, while grade 9 was one cohort later and grade 11, one 
earlier* 
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Figure B-2. 

ITED Composite Scores, Iowa Only (By birth year 
and grade at testing) 
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SOURCE? CBO oafeylsfions based on "Moan ITFD Tmt Scorns by Grodn and Subtest for the State Of Iowa! 1D62 to 
Present" Hows Testing Programs* unpublished and undated tabulations}; 



If taken in the context of the ITBS results, however, the ITED trends 
can be seen as offering further support for the cohort model Considering 
the two series together is logical, for while substantively the ITBS-IA and 
FTED-IA differ considerably, they largely reflect the same sample of 
students. 

The earliest low point in the combined Iowa data occurred in 1974 in 
the grade five ITBS, The latest was in the ITED for grade 12, which reached 
its low point five school years later, The nadir in the junior-high scores 
occurred in between -roughly, in 1975 in grade seven, 1976 in grade eight, 
and 1977 in grade nine. 

When tabulated in terms of birth cohorts, the low points in the 
combined Iowa data show less variation and less ordering from grade to 
grade. The earliest nadir was in the grade 11 ITED, which reached bottom 
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with the 1981 birth cohort. All of the remaining grades reached their low 
points with birth cohorts between 1962 and 1964, 5/ 

The Scholastic Aptitude Test (SAT) 

The SAT data have the advantage of providing largely comparable scores 
from 1966 to the present, In addition, studies of the equating of SAT scores 
over time have been perhaps more extensive than those done with any other 
test. On the other hand, for present purposes, the SAT has several 
weaknesses: 

o Serious problems with self-selection of students taking the test; 

o Lack of comparable scores from a variety of grade levels; and 

o Narrowness of the range of subjects covered (only two tests are 
administered^mathematics and verbal aptitude). 

Enough is known about self-selection of students taking the SAT to 
know that those taking it are not representative of high school seniors in 
general. Not enough is known, however, to control fully for the non- 
representativeness of the SAT sample, On the other hand, while eomposi* 
tional changes-that is, changing seltseleetion-played a major role in the 
earlier (pre497G) part of the decline in average SAT scores, they apparently 
have had only small effects in recent years. Moreover, they do not account 
for the turnaround in SAT scores, the timing of which is the most important 
aspect of the data for testing the cohort and period models. 6/ 

The end of the SAT decline fits the cohort pattern suggested by the 
Iowa data very closely. Both the mathematics and verbal scales of the SAT 
reached their minimums in the 19794980 school year, remained at that 
level for one more year, and then began their increases in the 1981 school 
year. Thus, the lowest scores reflect primarily the birth cohorts of 1962 and 
1963, and the upturn began with the birth cohort of 1964 (see Figure B* 3), 



5, Grade six is ambiguous, It reached its low point somewhere between the birth cohorts 
of 1963 and 1965. 

8, This point is discussed mora fully in Congressional Budget Office, Educational 
Achievement: Explanations andImptication$ of Recent Trend$ (forthcoming), 
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Figure B-3. 

Average SAT Scores 
(By birth year and 
subject; differences 
from lowest year) 
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The Amer ican College Testing Program (ACT) Tests 

The ACT tests are also intended as college admissions tests, although they 
differ substantially from the SAT in format and content. The principal 
advantages and disadvantages of the ACT scores for present purposes are 
largely similar to those of the SAT, The ACT has the additional advantage 
however, of covering a wider range of subjects: natural science and social 
studies, in addition to mathematics and English. 

The end of the ACT decline is relatively clear-cut and is not 
consistent with the cohort pattern shown by the Iowa and SAT data 
Average scores on the English and social studies tests bottomed out with the 
birth cohort of 1958, which was several cohorts earlier than those that 
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Figure B-4. 0.8 



ACT Scores 
(By birth year and 
subject; differences 
from lowest year) 
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SOURCE; CHO calculations based on L. A. Munday, Declining Admissions Test Scores (Iowa City: American 

College Testing Program, 1970), Table 3; and American CollPgo Testing Program, National Trend Data for 
Students Who Take the ACT Assessment (Iowa City; ACT, undated). 



produced the lowest scores on the ITBS, 1TEB, or SAT (see Figure B-4), The 
mathematics trend is less clear. The major decline in scores ended with the 
birth cohort of 1959, but average scores moved down further, albeit slightly 
and erratically* until the 1965 birth cohort. 

The ACT data also do not show the pronounced upturn in scores that 
characterises the post-1963 birth cohorts in the SAT and Iowa data. Since 
the 1958 birth cohort, scores on the ACT test have fluctuated, showing only 
small and inconsistent increases (see Figure 8-4). On the other hand, since 
the birth cohort of 1965-*one to three years after the cohorts marking the 
bottom of the Iowa and SAT trends-the ACT tests have shown a fairly 
clear, but still very small, increase, 
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The Now York Stn te Asse ssment Data 

New York State administers a wide range of tests to students of various 
ages, one of which provides a good test of the cohort and period models* In 
general, this one test conforms to the cohort model, showing timing that is 
largely consistent with that shown by the Iowa data and the SAT, 

The Pupil Evaluation Program (PEP), begun in 1965, includes tests of 
reading and mathematics administered in grades three and six, Until 
recently, a norm-referenced test was used, and comparable annual data are 
available for spans of up to 16 years* Because the test is used to screen 
students requiring remedial services, the results are often tabulated In 
terms of the proportion of students falling below a threshold used for that 
purpose-the M state reference point/ 1 7/ 

Three of the four tests-reading at both grade levels, and mathematics 
at grodr six-conform to the cohort model suggested by the ITBS, the ITED, 
and the SAT. These three tests stopped declining with the birth cohort of 
1962 and began improving markedly within a few years (see Figure B-5), 
Because the numbers are rounded and show no change for periods of two or 
three years before the upturn, the improvement might actually have begun 
with the cohorts a year or even two years earlier than 1983 or 1984, but that 
would still leave the timing consistent with the upturn suggested by the Iowa 
and SAT data. On the other hand, the proportion of students scoring above 
the reference point on the grade three mathematics test has been increasing 
almost without exception since the birth cohorts of the late 1950s, This 
exception is perhaps to be expected, however, given the general absence of 
sizable score declines in the earliest grades. 



The California State Assessment Tests 

Average scores of twelfli grade students in the California state assessment 
program fail to confirm either the cohort or period model, since they show 
very little change In any of the four subjects tested (see Figure B«6), The 
only appreciable year-to-year changes occurred between 1974 and 1975 (the 
birth cohorts of 1957 and 1958), and these changes were inconsistent in 
direction among subjects. 



?, Bivisloa of Educational festlag, Student Achieuemmt in New York State 1982*83 
(Albany* New York State Edueattea Department, January 1984), 
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Figure EHL 

Percent of New York 
Students Scoring 
Above Relet ence Point 
(By birth year, grade, 
and subject) 
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Figure 

California State Assessment Test Scores (By birth year, 
grade, and subject) 
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Grade six scores from the California assessment also provide no 
support for either model (see Figure B- 6). The birth cohort of 1084 scored 
substantially above the preceding cohort, but scores have risen only a small 
amount since then. Since the test was altered in the year that the 1964 
cohort took the test (1975), this one.year increase in scores is likely to bo a 
result of differences in tests rather than differences between cohorts, 



The Virginia State Test Data 

Data are available for the Virginia statewide assessment of fourth-, eighth-, 
and eleventh-grade students since 1972, During the seven-year period from 
1974-1975 through 1980-1981, a single edition of one test (the 1971 edition 
of the SRA) was used. Because the same set of norms was used for scoring, 
the yearly averages from that time span can be compared with ench other. 

The Virginia assessment data provide a weaker teat of the cohort and 
period models than do the data series above, but they provide a stronger test 
than do some of the intermittent data series discussed below. The relevant 
fourth-grade data begin only with the 1965 birth cohort, which is too recent 
to show the end of the decline if the cohort model is correct. The eighth 
grade data do span the end of the decline, but only barely; the first data 
point is the 1961 birth cohort. The eleventh grade data span the end of the 
decline nicely but lack information for the birth cohort of 1961. 

Given these limitations, the composite scores from the Virginia data 
appear to conform closely to the cohort model (see Figure B- 7). Among 
eleventh graders, the low point appears to have occurred with the birth 
cohorts of 1961 or 1962, although the large increase between the 1958 and 
1959 birth cohorts calls the stability of the scores into question. The 
average scores of eighth graders appears to have reached its low point with 
the 1962 birth cohort, though the absence of data before the 1961 cohort 
leaves some doubt about that. Finally, fourth-grade scores have been 
increasing from the first year of data, which is consistent with the cohort 
model. Since the earliest data are for the 1966 cohort, however, this fact 
offers the model only weak support. Scores on the specific subject-area 
tests that enter into the composite scores (reading, mathematics, and 
science) show largely similar trends, except that the upturn among eighth 
graders is less clear-cut in reading. 

The National Assessment of Educational Progress (NAEP) 

The NAEP data reflect assessments at intervals of up to five years. As a 
result, they provide only a weak test of the cohort and period models. They 
cannot pinpoint the year in which the decline ended or even confirm that 
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Figure B 7. 

Virginia Composite Achievement (By birth year and grade) 
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there was only one recent change in the direction of the trend-that is, only 
one recent period each of decline and upturn, 

For example, the NAEP mathematics scores of 13-year-olds reached 
their lowest recorded average with the assessment of 1977-that is, with the 
birth cohort of 1964 (see Figure B -8). The true low point, however- 
assuming that there was only one-might have occurred with any of the birth 
cohorts from 1960 through 1967, For the low point to have occurred within 
a few years of the tested cohorts of 1959 or 1968 is unlikely, for that would 
have required very abrupt changes in average scores, but a considerable 
range of alternatives to the apparent low of 1964 remain plausible. 
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Figure B-8, 

NAEP Mathematics Scores (By birth year and age) 



Age 17 



" i I i i i i i i 



Age 13 



o 



/ 



■ 1 1 



Difference from Lowett Score 



Age 9 




i ' ' ■■ I I i t i t i i I i 



1813 1ISB 1860 1864 1856 1860 1864 1868 1860 1864 1888 1872 

Birth Yiir Birth Year Birth Year 



SOURCE: CBO calculations based on National Assessment of Educational Progress, The Third National 

Mathematics Assessment; Results, Trends, and issues (Denver: NAEP/Edueafion Commission of the 
States, 1983); 



Moreover* the NAEP data are not entirely consistent* *even within 
these limits- -with either the cohort or the period model* On balance, the 
data seem more consistent with a cohort model and suggest an upturn that 
began* as in the SAT, Iowa, Virginia, and New York data, with the birth 
cohorts of the first half of the 1960s. There tre enough exceptions, 
however, that some observers might disagree with this generalization, 

Of the NAEP data, the mathematics results are least consistent with a 
cohort model and, conversely! most supportive of a period interpretation 
(see Figure B*8), In the case of both 9- and 18-year olds, the lowest 
average score occurred in the 1977 assessment"* that is, with the birth 
cohorts of 1968 and 1984, respectively, This pattern is entirely consistent 
with p period model, The actual lowest points, however, might have 
occurred in years when there was no assessment and thus might differ 
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between the two age groups. In the case of 17-year-olds, the low point 
was marked by both the 1977 and 1981 assessments, since the average scores 
in those two years were effectively equal On the other hand* the data from 
the 13- and 17-year-old groups- -but not that from the 9-year-olds- -is also 
consistent with a cohort model If the cohort model pertains, these data 
suggest that the minimum occurred with the birth cohorts of the first half 
of the 1960s- -perhaps, in the range of 1961 through 1965, 

The NAEP science and reading assessments are somewhat more 
supportive of the cohort model, although in these subjects also the patterns 
are not clear-cut The science data, regardless of age, provide no indication 
of further sizable drops after the birth cohort of 1963 r although the absence 
of comparable tabulations from the most recent assessment calls this into 
doubt and leaves open the possibility of a period effect (see Figure B-9), 
The NAEP assessments never showed a sizable decline for reading as a 
whole, but the reading data do suggest that average achievement began 
rising with the birth cohorts of the early 1980s or late 1960s (see Figure 
B-10), (The scores of 13-year-olds are in this case a rare exception in 
suggesting the possibility of an upturn that began before the cohorts of the 
1980s.) The NAEP assessment of inferential comprehension in reading« 
which, unlike the data for reading as a whole, did show a decline-also is 
consistent with the view that the decline ended and the upturn began with 
the cohorts of the early 1960s (see Figure B-ll)* 



The ITBS National Norming Data 

The ITBS, like most commercial standardized elementary and secondary 
tests, is renormed approximately once every seven years* The ITBS norming 
data reported here/unlike the ITBS4A data described above, is based on 
national samples of students, 8/ 



8, Although norming data need net be useful in aliening national trends in test scores, 
the norming of the ITBS and certain other tests does yield valuable information on 
trends* The principal purpose of renorming is to estimate the national distribution 
of scores on a new version of the test, so that districts using the test have an updated 
national standard against which to judge their own scores. This objective does not 
necessitate equating the old and new versions of the test, The two versions often are 
equated, however, and the results of the equating provide an estimate of the change 
in the national distribution of scores. All ITBS norming results have been equated to 
previous norming-sample results. 

Equated national norming data are available for the ITED as well but are not discussed 
here. The ITED averages declined between the two most recent normings (1971 and 
1978), but there has been no renorming since then. As a result, there is as yet no evidence 
of the overaM upturn in scores. Lacking that, the ITED norming data provide no 
information on the timing of the decline's end. 
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Figure B-9, 

NAEP Science Scores (By birth year and age) 

Differing* from Lswgit Seen 



Age 17 



Q 



I l i i i l i i I i i i I i 



Age 13 



" i i i i i i i i j i [ 



s 



Difftrinc* ffom t.nwntt Seori 



Age 9 



m i i i i i i i i i I i 



19S2 1956 1960 1864 19BB 1080 1884 1968 I960 1964 1968 1971 
8lrth Birth Yfir Birth Ytir 

SOURCE: CBO calculations based on Notional Aesosament of Educational Progress, Three National Assessments of 
Seme* Changes m Achievement, ifm 77 (Denver; NAEP/Educotion Commission of the States, 1978), 



Figure B 10, 

NAEP Reading Proficiency Scores (By birth year and age) 
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Figure B-11. 

NAEP Reading (Inferential Comprehension) Scores 
(By birth year and age) 
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Norming data offer even weaker evidence than does the NAEP for 
testing the cohort and period models* Like the NAEP and all other 
intermittent data, norming data cannot precisely pinpoint the timing of a 
turnaround in achievement trends* In addition, norming data usually have 
even longer gaps between test years than those in the National Assessment 
(most often, seven years)* These long gaps further exacerbate the 
uncertainty* Norming data generally also entail testing all grade levels in 
all subjects at the same time* In conjunction with the long period between 
renorming, this factor can force the trend data to appear to be a period 
effect even if the true underlying pattern is a cohort effect* 9/ 



9, The extent of this bias depends on the time spaa between normings, the range of grades 
tested, the number ef years between a gives norming and the true minimum in the trend 
data, and the slope of the curves on both sides of the minimum. For example, suppose 
that grades four through siK are tested in 1972 and 1979 and that the true trend is a 
cohort model, with grade four reaching its low point in 1972, grade five in 1978. and 
so on, If the declines and upturns in each grade are reasonably similar in severity, all 
three grades will show their lowest scores in the 1972 norming sample. If, however, 
the testing continues through grade 12, the older grades-beginning with grade eight 
or nine-would probably show their lowest scores in the 1979 norming sample* 
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Taken together, the 1TBS norming data can be seen as consistent with 
either a period or a cohort model But they do suggest-albeit weakly-that 
if the cohort model is correct, the low point might be a few cohorts later 
than in the Iowa, SAT, and New York data. In all grades from fourth 
through eighth, the scores of students in the norming sample reached their 
lowest observed levels with the norming of 1977-1978, corresponding to the 
birth cohorts of 1984 through 1988 (see FigureB-12), 10/ If the decline 
reached its end with the birth cohort of 1968, for example, one might expect 
fourth-and fifth-grade scores to be lowest in the prior (1970-71) norming, 

The California Test of Basic Skills (CTBS) Norming Data 

An equating rtudy of the most recent (1973 and 1980) normings of the CTBS 
provides a somewhat stronger test of the two models, for the large span of 
grades tested (first through twelfth) in part compensates for the long 
interval between the two test dates, 

If the cohort model and the timing suggested by the Iowa! SAT, and 
New York da*a are correct, the CTBS data should show increases that are 
sizable in the elementary grades, gradually decrease in size in the junior- 
high grades, and are replaced by declines in the senior-high grades, In grade 
five and below, both norming samples comprise cohorts born in 1963 or 
later-that is, cohorts that produced increasing scores in the other data 
bases, In grades 6 through 11, the norming samples comprise varying mixes 
of post-1963 and pre-1963 birth cohorts, and the increases among the former 
should tend to offset the declines among the latter, Finally, both grade-12 
samples were born in 1963 or earlier, so if the decline ended in 1963, the 
change at that grade level would reflect only years of declining 
achievement, 

The changes in the CTBS norming samples largely conform to these 
predictions from the cohort model, With one exception, all comparisons at 
grade nine and below showed increases from 1973 to 1980, with a tendency 
for the largest gains to be in the lowest grades. For example, in the fall 
testing, the achievement of a third-grade student scoring at the 34th 
percentile in 1980 corresponded roughly to that of the median student in 
1978, while in grade eight, a student would have had to reach the 46tU 
percentile to score at the level of the median student of seven years earlier, 



10, la grade three, average scores inereaied with every norming sample after the initial 
(1955) one-mirroring the negligible decline in third-grade leorei in the ITBS4A data 
- - and are excluded from this diseusslon, 
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Figure B-12, 

ITBS National Norming Data (By birth year and grade) 
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In contrast, students in the eleventh and twelfth grades showed a drop in 
achievement during that period, 11/ 



The California Achievement Tests (CAT) Norming Data 

The 1970 and 1977 normings of the CAT were equated to each other and can 
be used in the same way as the 1978 and 1980 CTBS to test the cohort and 
period models. The two editions of the CAT, however, were more dissimilar 
from one another, making the procedure riskier* 

Because the CAT was renormed three years earlier than the CTBS, one 
would expect the observed changes to switch from increases to decreases 



11, California Test Bureau, MeGraw Hill, unpublished tabulations, The most salient 
exception to this pattern occurred among niath-grade students, who showed larger gains 
than any students above grade three. 
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three grades younger. Specifically, if the decline followed the cohort 
model and reached its low point with the 1963 birth cohort, one would 
expect grades nine and above to reflect only years of decline, while grades 
three through eight would reflect varying mixes of increasing and decreasing 
years. Only grades one and two would reflect solely increasing years and 
those are grades in which the decline appears never to have occurred. 

The results of the CAT renorming study largely conform to these 
predictions based on the cohort model. Grades one and two showed gains of 
over 0.6 standard deviation. These increases rapidly tapered off with 
increasing age, so v>t grades five and six showed essentially no chance 
Grades seven and eight showed declines of less than 0.2 standard deviation, 
while the higher grades all showed drops larger than 0.3 standard 
deviation. 12/ 



AN AGGREGATE TEST 

OF THE COHORT AND PERIOD MODELS 



Another method of testing the cohort and period models is to assess 
which model yields the least variable estimates of the timing of the end of 
the decline, considering only those continuous data bases that show a clear 
ow point That is, the timing of the decline's end can be estimated in 
terms of bo.h test years and birth cohorts, and the relative variation in 
those estimates mdicates which of the models fits the data more closely 
This approach, however, suffers from the relatively small number of data 
oases that can be applied. 

Among the data bases meeting these criteria, the cohort model fits 
the data more closely than does the period model (Table B-l). The end of 
the decline, expressed in test years, showed a mean of 1976 and a 12-year 
range (from 1970 to 1982). When expressed in terms of birth cohorts/the 

,tia n rZi il Bd ? °! ean ° f 1962 and a ran » e of onl y 8even years (from 
lyob to 1965). The standard deviation of the estimate is roughly 60 percent 
larger when test years are used. 13/ 



12. California Test Bureau, MeGraw Hill, unpublished tabulations. 

13 ' LTf i'r!,?i am ! ,igUit i M J ' noted io «*«ve, in specifying single years as 

■wT » t m i i « ^ at8 aDd 411686 UM «rt»»nti«» apply to the patterns 

shown m Table B-l as well The most striking ambiguity entails the ACT mathematics 
assessment which continued to decline, though slightly and inconsistently, for several 
years after the substantial decline ended. Table B-l uses the year that the decline ended 

nStZFi Sub8tltutlB » the ye , ar **« d «»ne « mathematics scores ended 

(1B78), howev,,, wuum uot alter the conclusions. While it would make the relative 
fit of the cohort and penod models more similar, the cohort model would still fit 
appreciably better. 
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The closer fit of the cohort model is much more striking if the ACT is 
excluded* The ACT is anomalous in two respects among continuous data 
bases showing an achievement dccline-the early end of its decline and the 
lack of a subsequent upturn. Because these anomalies are unexplained, 
retesting the cohort model without the ACT seems warranted. When the 
ACT is excluded, the test years marking the end of the decline shows a nine- 
year range (from 1970 to 1979), while the birth cohorts show only a three- 
year range (from 1961 to 1984), Similarly, the difference between the test* 
year and cohort-year standard deviations is much larger-the former is 
nearly 3.5 times as large as the latter, 
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TABLE B-L TIMING OP THE END OF THE ACHIEVEMENT DECLINE, 
BY TEST (Test years and birth years of group 
showing lowest score) a/ 



Test Birth 

Test Grade Year Year 



ACT Mathematics 
ACT English 
ACT Social Studies 
SAT Verbal 
SAT Mathematics 
ITED Iowa Comprehensive 
ITED Iowa Comprehensive 
ITED Iowa Comprehensive 
ITED Iowa Comprehensive 
ITBS Iowa Comprehensive 
ITBS Iowa Comprehensive 
ITBS Iowa Comprehensive 
ITBS Iowa Comprehensive 
Virginia Comprehensive 
Virginia Comprehensive^ 
New York Reference-Point 

Mathematics 
New York Reference-Point Reading 
New York Reference-Point Reading 
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1963 
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1963 
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Variability of Estimates 

Including the ACT 

Mean 1976 m2 

Standard Deviation (in years) 2,7 i . 7 

Minimum 1970 iggg 

Maximum 1982 1965 

Range (in years) 12 7 



(Continued) 
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TABLE B-l. (Continued) 



Test Birth 

Test Grade Year Year 



Excluding the ACT 

Mean 1976 1962 

Standard Deviation (in years) 2,5 0 * 73 

Minimum 1970 1901 

Maximum 1979 1964 

Range (in years) 9 3 



SOURCES; CBO calculations based on American College Testing Program, National 
Trend Data for Students Who Take the ACT Assessment Clowe City: ACT, 
undated); The College Entrance Examination Board, National College-Bound 
Seniors, 1985 (New York: The College Board, 1986); "Moan 1TED Test Scores 
by Grade and Subtest for the State of Iewa M (Iowa Testing Programs, 
unpublished and undated tabulations); "Iowa Bade Skills Testing Program, 
Achievement Trends in Iowa: 19SS-1985* 1 (Iowa Testing Programs, 
unpublished and undated material); S, John Davis and ft L, Boyer, 
Memorandum to Division Superintendents: State Testing Program ResultSt 
1980*81 (Richmond: Commonwealth of Virginia Department of Education] 
1081); Division of Educational Testing, Percent of Pupils Scoring Below State 
Reference Point on Pupil Evaluation Program Tests (Albany: New York State 
Education Department, undated), 

a, End is last year before increase or stability, See text for explanation of ambiguities 
involved in specifying one year as the low point in thtse series, 

b, The low point could be either the 1977 or 1978 test years; no data are available for 1977. 
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APPENDIX C 



DIFFERENCES IN TRENDS 
BY SUBJECT AREA 



As discussed in Chapter 111, among all of the tests considered in this paper, 
no single subject area consistently showed the most severe decline in 
average scores. Nor was the decline consistently more substantial in either 
''directly** or ,? ind^rect]y ,, taught subjects* This appendix provides the 
information on which those conclusions are based. 

Not all of the data sources discussed in this paper could be used for 
making t&mpnfimm nmong subject nr^m, Only those test** that included 
more than one subject aren and that could be converted to standard 
deviations {SDs) could bo used, since only in those instances could the 
relative size of the decline among subject areas be ascertained, The most 
serious omission for this reason is the National Assessment of Educational 
Progress; the NAEP staff did not return sufficient information on SDs to 
convert published rew scores. 

This appendix includes data from tests administered both annually and 
less frequently, but comparisons among subject areas often have a somewhat 
different meaning in the two cases. When annual data are available, the 
beginning and end of the decline in each subject can be ascertained, and the 
tabulations in this appendix represent the total amount of each decline, 
regardless of its duration. In those instances, the largest decline need not 
be the most rapid* A subject showing a slower decline than others, for 
example, can drop more in total if its decline is sufficiently long in duration* 

In the case of tests administered less often than annually, however, 
the beginning and end of the decline cannot be pinpointed, In those 
instances, the tabulations in this appendix represent the amount scores 
dropped during a fixed period for all subjects in one test battery- -for 
example, the period between two normings, or between the National 
Longitudinal Study (1971) and the High School and Beyond study (1979)J/ If 
the period used does not include years of rising scores, these comparisons 
indicate the relative rata of decline among subject areas, as well as the 



1, 



The NLS and the HSB tests were administered in the springs of 1972 mi 1680, 
respectively - -that is, in the 1972 and 1979 school years* 
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amount of the decrease over that period, The comparison!! need not, 
h« vvovor, indicate the? relative total decline among different nuUjects, since 
they cannot take into account diflbrences in the duration of the decline, 
Moreover, because the time* span used can encompass varying periods of 
rising scores^ these comparisons ore loss reliable than those based on 
annual data, 2/ 

The majority of the tests considered here showed the largest declines 
on language-related subtests, hut the exceptions were frequent enough to 
suggest that this ranking is more a reflection of the attributes of individual 
tests than an underlying consistency in the achievement trends (see 
Table C-l). In addition to the SAT f test batteries that showed the greatest 
decline in language*related tests include the NLS and HSB comparison* the 
grade 12 Iowa state data (ITED-Iowa), the Illinois Decade Study, and, for the 
most part, the Project TALENT 15-year comparison (I960 and 1075), In 
contrast, Iowa state elementary school data (ITBS-Iowa) show the opposite 
pattern: the decline in mathematics was much more severe than that in any 
of the language-related subjects. Senior high school norming data for the 
California Achievement Test (CAT-US; also show a greater decline in 
mathematics than in other areaSt Other test batteries-such as the national 
norming data for the elementary-level Iowa test battery (ITBS-US)-show a 
more complex pattern, with the various language-related tests bracketing 
the mathematics test in terms of the magnitude of the decline. The ACT 
showed a slightly larger decline In English than in mathematics. It also 
showed its largest decline in social studies, however, and no decline at all in 
science. 

The various tests are also inconsistent in terms of the relative declines 
in "directly taught" and "indirectly taught" subjects, Some of the language- 
related tests that showed particularly steep declines-such as the vocabulary 
tests in the Project TALENT data and the NLS4o-HSB comparison-might 
be viewed as being largely indirectly taught subjects. Other language- 
related tests that declined markedly, howtver f presumably are much more 
reliant on formal instruetion-*such as the language test in the national ITBS 
data and the expression test in the national ITED data, both of which are 
tests of language usage. In addition, mathematics, which hai been used as 
an example of a directly taught subject, showed the steepest decline in 
several test batteries. 



2, In the casfe of tails administered less often than annually, the tabulations used here 
are based en a single interval during which all subjects evidenced declines. If an adjacent 
interval showed declines in some subjects but not others- -as was the ease, for example, 
with the grade -eight ITBS norming data* *that adjacent period was ignored. 
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TAB LB C - 1 . MAGNITUDE OF THE ACHIEVEMENT DECLINE, 
BY SUBJECT 



Total Decline 
(Standard 

Teat Grade Subject Deviations) 





12 


Verbal 


0,48 




12 


Mathematics 


0,28 


NLS to HSB 


12 


Vocabulary 


0,22 




12 


Reading 


0 21 




12 


Mathematics 


0,14 


f^rff n tin 


12 


Expression 


0,28 




12 


MathomaticB 


0.28 




12 


Vocabulary 


0,23 


ITED-US 


10 


Mathematics 


0 32 




10 


Expression 


0.29 




10 


Vocabulary 


0.22 


XXbU-Xowa 


12 


Reading a/ 


0.40 




12 


Social Studies 


0,36 




12 


Expression 


0.32 




12 


Vocabulary 


0.30 




12 


Science 


0.28 




12 


Mathematics 8 


0.27 


ITBD-Iowa 


10 


Reading oJ 


0.32 




10 


Mathematics 


0.31 




10 


Expression 


0.29 




10 


Social Studies 


0.27 




10 


Vocabulary 


0,26 




10 


Science 


0.25 


ITBS-Iowa 


8 


Mathematics 


0.47 




8 


Language 


0.87 




8 


Reading 


0.36 




8 


Vocabulary 


0.26 








(Continued) 
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TABLE C-L (Continued) 



Total Decline 
(Standard 

Test Grade Subject Deviations) 



ITBS-Iowa 


6 


Mathematics 


0,88 




6 


Language 


0.25 




6 


Reading 


0.17 




6 


Vocabulary 


0. 10 


ITBS-US 


fa 


Language 


0.32 




8 


Mathematics 


0.28 




8 


Vocabulary 


0.28 




8 


RooHinp 


0,20 


ITBS-US 


6 


Language 


0.32 




6 


Mathematics 


0.28 




6 


Vocabulary 


0.19 




6 


Reading 


0 17 


CAT -US 


12 


Mathematics 


0.34 




12 


Reading Comprehension 


0.24 




12 


Vocabulary 


0.23 




12 


Language 


0.18 


CAT -US 


9 


Mathematics 


0.30 




9 


Language 


0.28 




9 


Vocabulary 


0.21 




9 


Reading Comprehension 


0.05 


ACT 


12 


Social Studies 


0.55 




12 


Mathematics 


0.42 




12 


English 


0,37 




12 


Science 


-0.06 


Illinois Decade 


11 


English 2 


0.49 




11 


English 1 


0.38 




11 


Social Studies 


0.36 




11 


Math 2 


0.26 




U 


Science 


0.19 




11 


Math 1 


0.05 








(Continued) 
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TABLE CM. (Continued) 



Total Decline 
(Standard 



Test 


Grade 


Subject 


Deviations) 


Talent 16 -Year 


9,10,11 


Vocabulary 


0.40 


Follow* Up 


9,10,11 


English 


0.30 


9,10,11 


Quantitative Reasoning 


0.22 




9,10,11 


Reading Comprehension 


0.06 




9,10,11 


Computation 


0.23 




9,10,11 


Mathematics 


-0.07 




9,10,11 


Abstract Reasoning 


•0.24 




9,10,11 


Creativity 


•0.34 



SOURCES' CDO calculations basej on Hunter M. Breland, The SAT Score Decline: A 
Summary of Related Research (New York: The College Board, 1976); The 
College Entrant EKsmination Board, National College^Bound Seniors, 1978 
and 1985 (New York: The College Board* 1985); Donald A, Rock, Ruth B, 
Ekstrom, Margaret E, Qoertz, Thomas L, Hilton, and Judith Pollack, Factors 
Associated with Decline of Test Scons of High School Scnion, 1972 to 1980 
(Waahingtori: Center for Statistics, U.S, Department of Education, 1085); 
Robert Forsyth, Iowa Tilting Programs, personal communications, April, 
19§4; "Mean ITED Test Scores by Grade and Subtest for the State of Iowa" 
(Iowa City: Iowa Testing Programs, undated and unpublished tabulations); 
"Iowa Basis Skills Tasting Program, Achievement Trends in Iowa: 1965*1985" 
(Iowa City: Iowa Tasting Programs, undated and unpublished tabulations); 
A. N. Hieronymus, E. P. Undquitt, ftftd H, D, Hoover, Iowa Tests of Basic 
Skills: Manual For School Administrators (Chicago: Riverside, 1982); The 
Development of the 1982 Norms for the Iowa Tests of Basic Skills (Chicago: 
Riverside, 1983); CTB/MeGraw*Hili # unpublished tabulations, December 
1977; L, A, Munday, Declining Admissions Test Scom (Iowa City: American 
Collage Testing Program, 1976); American College Teiting Program, National 
Trend Data for Students Who Take the Act Assessment (Iowa City: ACT, 
undated); Student Achievement in Illinois, 1970 and 1981 (Springfield: Illinois 
State Board of Eduction, 1983); John C. Flanagan, "Analysing Changes in 
School Levels of Achievement Using Project TALENT Ten- and Fifteen* Year 
Retests," in O, R, Austin and H. Garber (eds,) f The Rise and Fall of National 
Test Scores (New York: Academic Press, 1982), pp. 35 * 49, 

NOTE: This table is limited to data that span a sizable portion of the decline and 

that permit exclusion of the subsequent upturn. Only selected grade levels 
are presented far the sake of simplicity . 

a. This is the "Interpretation of Literary Materials" test, Reading skills are alio measured 
by the ITED social studies and science tests. 
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VARIATION AMONG ACHIEVEMENT SUBGROUPS 



As discussed in Chapter IV, there is inconsistent evidence about the 
relative trends in test scores among different achievement subgroups —that 
is, among groups of students categorized by their differing levels of 
achievement. Because this issue has received considerable public attention, 
and because the conclusions presented in the paper are not entirely in 
keeping with those presented by some other writers, this appendix provides 
additional detail about the evidence that underlies the following five 
generalizations, presented in Chapter IV: 

o The achievement decline and the subsequent upturn occurred 
among both low- and high-achieving students^ 

o During the mid- and late 1970s«that is, during the end of the 
achievement decline and the beginning of the subsequent upturn- 
students in the top achievement quartile on the National Assess- 
ment of Educational Progress (the top fourth of all students, when 
ranked by achievement) lost ground relative to those in the 
bottom quartile. 

o Other data, however, do not consistently suggest a narrowing gap 
between the tap and bottom achievement quartiles. The narrow- 
ing evident in the NAEP data might be limited to the short time 
period of that particular assessment (roughly half of the 1970s), or 
it might be limited to certain types of tests. Alternatively, more 
detailed analyses than those now available might show the nar- 
rowing to be a more general pattern. 

o Test scores of students taking college-admissions tests-currently, 
about half of all high-sehool graduates-declined more than those 
of high school seniors in general, but this difference primarily 
reflects the changing composition of the group taking those tests 
rather than a greater decline in achievement among high-achiev- 
ing students. 



o 



Select students-those scoring highest on tests, taking the most 
advanced courses, and so on- -experienced both the decline and 
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the subsequent upturn in achievement, Select students did not 
show a consistently greater decline than the average studunt. 
Indeed, by some measures, select students appear to have gained 
relative to the average, particularly in the area of mathematics, 
The sketehiness and inconsistency of data on select students, 
however, cloud these conclusions. 

As noted in Chapter IV, however, both differences and similarities 
among trends in achievement subgroups must often be taken with a gram of 
salt, They can be simple artifacts of technical aspects of the tests used** 
specifically, the scaling of the test, its content, and the measure of change 
that is reported* For example, if both the top and bottom achievement 
quartiles show a decline of 5 percentage points in the average number of 
test items answered correctly, these seemingly equivalent changes could in 
fact reflect very different real changes in skills. The change would be 
proportionately larger in the bottom quartile, Moreover, the typical 
students in each quartile answer very different questions correctly, and only 
detailed information about the content and difficulty of the additional items 
answered incorrectly by each quartile would indicate whether the loss of 
skills in each group are qualitatively or quantitatively similar, 1/ Technical 
solutions of this ambiguity are complex and have rarely been applied to the 
specific question of relative trends among different achievement subgroups, 

The test results cited in this section differ in the certainty of their 
conclusions about achievement subgroups. At one extreme, the results of 
the Illinois Decade study are very ambiguous, because two available 
measures of change lead to different conclusions about achievement sub- 
group differences, At the other extreme, some«but not all-of the relevant 
tabulations from the National Assessment are clear-cut, because some show 
increases in the lowest quartile concurrently with decreases in the top 
quartile, Use of different scaling or reporting conventions would generally 
not alter the conclusion of a narrowing achievement gap in those cases, 



1, This ambiguity also arises with other common measures of change, such as sealecUseore 
Of standardized-seore changes. 

Technically, the problem has several aspects. One is that the metrics commonly used 
are not ratio scales; indeed, they are arguably not even interval scales, The construction 
of the tests poses additional problems, for a single test is nlikely to be a comparably 
comprehensive measure of mastery at two very different levels of achievement and 
therefore may understate the relative change of students at one level, The tabulation 
and reporting of results further complicates comparisons, since information on the 
additional items correctly or incorrectly answered is rarely reported, particularly for 
achievement subgroups, 
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TRENDS IN THE LOWEST AND HIGHEST QU ART1LES 



The most extensive and best-known information on the relative trends 
among stuchnts in the top and bottom achievement quartiles is from the 
NAEP, Relevant information is also available, however, from the SAT, the 
ACT, the ITBS, and the Illinois Decade Study, 



The National Assessment of Educational Progress 

In general, the currently available NAEP tabulations show a narrowing of 
the gap between the top and bottom quartiles in all three age groups (3, 13* 
and 17) and subjects (reading, science, and mathematics) for which the 
analysis was conducted* The comparative data, however, span only four or 
five years during the 10700. Comparable tabulations of the NAEP are 
unavailable for the remaining middle half of the student population. 

These particular NAEP trends show great variation- -changes ranged 
from sizable improvements to large declines— which complicates 
comparison of achievement subgroups. This variation probably results in 
part from the period over which changes were measured**beginning between 
1972 and 1974 and ending between 1378 and 1979, depending on the subject 
tested, Given the cohort pattern shown by the end of the decline, it is likely 
that these particular assessments of trends among nine-year*olds began 
about the time that their brief and small decline ended, The trend for 18* 
year*olds probably spanned the last years of the decline and the first years 
of the upturn, while the trend among 17*year-olds corresponds roughly to 
the last years of the decline, Consistent with this cohort pattern, the NAEP 
data described here show few declines among 9-year*olds, few gains among 
17*year-olds, and a more mixed pattern among IS-year-olds, Comparisons 
are thus clearest if made within any one age group, 

In the lowest quartile, nine*year»old$ showed improvement in two of 
three subject areas and no change in the other, This held true for both black 
and white students (see Table D*l), In the top quartile, black students also 
showed improvement, White students did not, however; they showed sizable 
declines in two subjects and no change in a third. 
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TABLE 0*1. AKCENTTKENOS IN TIIK N ATONAL ASHKHMRNT, BY AGHlBYEMgNT SUHGHOUF3 AMD BTUNICIT V 



Croup 



Heading 



( SuhJ|! et Area 
Science 



Mathematics 



8'Yfaf*01(fe> (n the 4 th Cirade 

Lowest Quart!!* 
Black Students 

White Students 

Highest QuAPtfia 
Block students 

White students 

13-Ygap-Uids in tfmSthUrade 

Lowest Quiftile 
Mack students 

White students 

Highest Quarts 
Black students 

White Students 

17-V«ar»Qld» in the 1 j th Grade 

Lowest Quartile 
Black students 

White students 

Highest Quartile 

Black students 

Whitt students 



Improvements pin of 
1,4 percentage points 

Improvement! pin of 
4*6 percentage points 

improvement! gain of 
3.0 percentage points 

No significant change 
in performance 



Improvement! pin of 
34 pereentep points 

Improvementi gain of 
13 percentage points 

Improvement* gain of 
3.5 parentage points 

Nosipificant change 
in performance 



No significant change 

in performance 
Significant declines 

hi percentage points 

No significant change 

in performance 
No significant change 

In performance 



Nosipificant change 

in performance 
Improvement *pln of 

1 .7 percentage points 

No significant change 

In performanee 
Significant decline) 

2,4 percentage points 



No significant change 

in performance 
improvementi gain of 

2*0 percentage points 

Nosipificant change 

in performance 
Significant decline; 

4*1 percentage points 



No significant change 

In performance 
No significant change 

in performance 

Signif leant declines 
0,9 percentage points 

Significant decline! 
4*2 percentage points 



Improvement! gain of 
2.9 percentage points 

Nosipificant change 
in performance 

Improvement! gain of 
in percentage points 

Significant decline! 
34 percentage points 



Improvement! gain of 
2*8 percentage points 

No significant change 
In performance 

Significant decline! 

24 percentage paints 
Significant decline* 

3,2 percentage points 



Improvement! gain of 
1*6 percentage points 

Significant decline! 
LB percentage points 

Significant decline! 

5,5 percentage points 
Significant decline 

44 percentage points 



SOURCE: National Assessment of Educational Prog rest, ^Educational Winners and Losers, the Whos and Possible Why/ 1 
(press release. February *, 1993). 
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Among l3-yoar«olds as well, the lowest quartile showed mostly im- 
provements in performance, albeit typically smaller than among the younger 
children, (This, too, is expected in light of the cohort pattern*) White 
students in the highest quartile again showed declines in two of three 
subjects; among blacks in this quartile, gains and losses were approximately 
balanced, 

A similar discrepancy between the highest and lowest quartiles also 
appeared among the 17-year-olds, although overall-as expect ed-declines 
predominated over gains. Blacks in the lowest quartile showed no change in 
two subjects and a small gain in a third. Their white counterparts showed 
slight declines in two of three subjects, In contrast in the top quartile, both 
races showed large declines in two subject areas. 



Other Data 

Data from other sources, however, are partially inconsistent with the NAEP 
data and call into question whether there was a general closing of the gap 
between high- and low-achieving students on a variety of tests and over the 
entire period of the achievement decline, 

Tabulations of SAT candidates categoriied by self-reported class rank 
show a similar narrowing of the gap between high- and low-achieving 
students since 1975, Moreover, this pattern occurred over most of the range 
of achievement; each group declined relative to all others ranking lower, 
bringing the scores of high-ranking and low-ranking students closer to each 
other. (These data unfortunately do not include reliable information about 
the bottom 20 percent.) 

Ambiguous evidence on the relative trends among students in the top 
and bottom quartiles is found in the "Illinois Decade Study," a comparison of 
scores on a fairly high-level achievement test administered to Illinois high 
school juniors in the 197 1 ) and 1981 school years, Declines in raw scores 
were consistently larger among students at the 75th percentile, albeit 
sometimes by a very small margin (see Table D- 2), 2/ On the other hand, 



2. Student Achievement In Illinois, 1970 and 1981 (Springfield: Illinois State Board of 
Edueatioa, September 1983). Not* that these data art not entirely comparable to the 
NAEP achievement subgroups analysis. Rather than reporting the average scores of 
all students above the 75th pereentile-as in the NAEP reports*-the Illinois Decade 
study reports results for the students at the 75th percentile* The same distinction applies 
to the data on scores at the 25th percentile, Thus, the NAEP analyses incorporate 
students who are further apart in their levels of achievement. 
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TABLE D-2, CHANGE ON THE ILLINOIS DECADE TEST 

AMONG STUDENTS AT THE 26th and 75th PERCENTILE 



75th Percentile 25th Percentile 



Haw Percent Raw Percent 

Change Change Change Change 



Mathematics 1 -0,7 -4,8 -0*2 -2.7 

Mathematics 2 .1,6 *12,5 -08 -13*1 

Englishl -3,4 *16*0 -1,8 -13,8 

English2 -3,1 -15,2 -2.9 -22.1 

Social Studies -2.6 -16,0 -1,0 *H.Q 

Natural Science -0,8 -8.6 -0,8 -3,8 



SOURCE: CBO calculations based on Illinois State Board of Education, Student 
Achievemmt in Illinois, 1970 and 19BU Exhibit A-5; and J, Tyam t personal 
communication, 



when the changes are expressed in proportional terms, this pattern dis- 
appears, The percent change in scores at the 25th percentile were 
sometimes smaller but sometimes larger than those at the 75th percentile. 

Data from other tests, however, and from the SAT earlier in the 
period of decline (before 1975), cast doubt on the NAEP results, A 
tabulation of changes in SAT scores among groups of students divided by 
their percentile rankings on the SAT itself showed no comparable narrowing 
of the gap in the years before 1975. Indeed, in mathematics, the gap 
appears to have widened slightly (see the section below on "select 
students"), In addition, if the gap between the top and bottom quartiles 
were narrowing, one would expect a shrinking standard deviation-that is, a 
narrower distribution of scores, 3/ Since the beginning of the 1970s, 



The standard deviation would shrink unless then were other, offsetting changes in 
the distribution of scores-such as a change in the distribution of scores in the middle 
two quartiles. Moreover, without such other distributional shifts, changes in the 
composition of the test-taking group would not alter this link between the standard 
deviation and the gap between the top and bottom quartiles. Any change in the standard 
deviation attributable to compositional changes (such as an increase resulting from 
lower dropout rates) would also be reflected in the gap between high- and low-achieving 
students. 
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however, both the SAT and ACT have shown stable or slightly increasing 
standard deviations, 4/ The standard deviation of scores on the ITBS has 
also been increasing, 5/ Between the 1970 and 1977 school years, the 
Stamford deviations of the SRA achievement series showed different 
changes, depending on subject area and grade* In general, they tended to 
increase in the younger grades but decrease in the higher grades, 6/ Given 
known problems in obtaining truly representative norming samples for such 
tests in different years, however, as well as changes in the representative- 
ness of the samples over time, changes in the standard deviations of norming 
data should perhaps be given less weight than those in the other data 
sources, 7/ 

TRENDS AMONG COLLEGE* BOUND STUDENTS 



Much of the public awareness of the achievement decline stems from the 
decline in SAT scores. But students taking college admissions tests (the SAT 
and ACT) and those planning to attend four-year colleges constitute only 
roughly half of the senior class, and their average leva! of achievement is 
above the overall average, |/ Thus, it is important to gauge whether 



4, The College Board, Coltege*Bound Seniors, 1984; and American College Testing 
Program, unpublished tabulations* 

5 H, D, Hoover, Iowa Testing Programs* personal communication, March 1984, 

5. ' Science Research Associates, SRA Achievement Berks, Technical Report #3, Table 2, 

7, With respect to the problems in norming samples for such tests, see Roger F. Baglin, 
"Does 'Nationally' Normed Really Mean Nationally? 0 Journal of Educational 
Measurement, vol. 18 (Summer 19B1), pp. 97408; and Science Research Associates, 
SRA Achievement Series , Technical Report #3, 

8, The group taking colleg#*admissions tests and those entering collage are not entirely 
the same, since not all college-bound students take the tests. In 19S4, about 28 percent 
of those students graduating (excluding those obtaining high-school equivalency 
credentials) took the ACT, and 37 percent took the SAT, Those groups overlap to same 
unknown degree, however, so the proportion taking one or the other is less than the 
sum. The proportion taking such tests was lower during the early years of the decline. 
Similarly, 46 percent of all seniors in the class of 1980 (a larger group than all graduates, 
because of sanlor-year drop-outs) planned to attend at least four-year colleges, See The 
College Entrance Examination Board, National Coltege*Boun4 Seniors, 1985 iNew 
York: The College Board, 1985); American College Testing Program, Executive 
summary: National ACT Assessment Results, 1984*1985 (Iowa City: ACT, 1985); 
National Center for Education Statistics, Projections of Education Statistics to 1990*91 
(Washington, D.C,: NCES, 1982); and Donald A, Rock, Ruth B t Eckstrom, Margaret 
E, Goerti, Thomaj* L* Hilton, and Judith Pollack, Factors Associated with Decline of 
Test Scores of High School Seniors, 1972 to 1980 (Washington, D.C: Center for Statistics, 
U.S. Department of Education, 1985), 
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trends on college-admissions tests are indicative of comparable trends 
among high-school seniors In general and, if not, 'whether differences reflect 
different trends among college-bound achievement subgroups or some other 
factors, 

A difference between the trends shown by college admissions tests and 
tests given to all students need not indicate that achievement trends in the 
relatively high-achieving group of students taking the test are different 
from those in other achievement subgroups. A difference in score trends 
could also reflect changes in the self-selection of students taking the tests 
or differences between the tests themselves and other tests administered to 
the student body as a whole. 

As noted in Chapter IV, the decline in average scores on both the SAT 
and ACT were exacerbated by changes in the self-selection of students 
choosing to take the tests. In the case of the SAT, research suggests that 
oyer half of the decline between 1963 and 1970, hut relatively little of it 
since then reflected changes in the composition of the group taking the 
test. 9/ Thus, m one sense, both the SAT and the ACT exaggerate the 
decline, in that the drop in average scores would have been substantially less 
H the test-takmg group had remained constant or had changed only as the 
entire school-age population changed. (The research on this issue is 
described in CBO's forthcoming volume, Educational Achievement: 
Explanations and Implications of Recent Trends.) 

This exaggeration of the decline, however, does not imply a greater 
drop in achievement among the relatively high-scoring achievement 
subgroups that tend to take these tests. A larger real decline in that group 
would be indicated if the decline on the SAT were larger than that on tests 
given to all high-school seniors, even after removing the influence of self- 
selection changes and accounting for differences between the tests No 
existing studies, however, fully clarify whether there would be a greater 
decline on the SAT under those conditions, in part because there is not 
sufficient information to adjust for differences between the tests. 10/ 



9 ' f 8 5u 1 ^ ?* S ? ° lMtlc Aptitude Test Score Dec » n «. On Further Examination 

(New York: The College Board, 1977), p, IS, 

10. In thii context, one would want either confirmation that the teste involved would show 
similar trends jf administered to the same students, or suffie^t information to adjust 
the trends from one test to parallel those that would bt produced by the other. Although 
equating studies that permit comparison of scores among tests at any one time ar* 
common, similar studies that permit comparisons of trends are largely lacking Thua' 
as noted in Chapter III, much of the variation in trends among tests cited in this mmr 
remains unexplained, ^ v ^i 
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The available evidence, while not fully conclusive, does not suggest 
that the achievement decline was sharper among collegc-bound students 
than in the student population as a whole* Indeed, the decline might have 
been less severe in some college-bound groups during the early years of the 
decline. One study that directly compared trends in reading achievement 
among all seniors, college entrants, and SAT candidates between 1960 and 
1972 found that the scores of college entrants* unlike those of SAT 
candidates, dropped only approximately as much as those of all seniors.il/ 
Since the college-bound population was also becoming less select during this 
period, the similarity might indicate that the average scores of some groups 
that traditionally sent many students to college were declining less than 
others, thereby offsetting the effects uf the growing number of lower- 
achieving students going to college, 12/ 

For the years since 1972* *the larger part of the period of decline on 
the SAT-*there is no evidence that trends among colleg©*bound students as 
a whole differed substantially in cither direction from those among all 
seniors, In the nationally representative comparison of the NLS and HSB, 
seniors stating that they planned to attend four-year colleges or graduate 
schools showed declines in vocabulary, reading, and mathematics roughly 
comparable to those of the whole senior class, 13/ Comparisons of trends on 
a variety of tests administered to juniors and seniors show some trends in 
the general student body that are more favorable than those on the SAT and 
ACT but others that are less favorable* Moreover* the trends on the SAT 
and ACT are inconsistent with each other (see, for example. Table 111*2 in 
Chapter III), Given this inconsistency and the unexplained variation in 
trends among tests, disparities between the ACT and SAT and any given test 
administered to the student population as a whole could be reasonably 
attributed to differences in test characteristics rather than to variations in 
trends among achievement subgroups, 



11, Albert E, Beaton, Thomas L, Hilton, and William B, Sehrader, Changes in the Verbal 
Abilities of High School Seniors, College Entrants, and SAT Candidates Between 2960* 
1970 (New York: Hie College Board, 1077}, 

12, Advisory Past! on the Scholastic Aptitude Test Score Decline, On Further Examination^ 
pp. 13 - 16, Note that the SAT candidate group underwent changes in composition beyond 
those affecting the college-bound group as a whole, reflecting a change in the proportion 
and characteristics of those college-bound students taking the SAT, 

13, Donald Rock and others. Factors Associated with Decline of Test Scores, 
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SELECT STUDK NTS 



Because recent trends $n the National Assessment have been relatively 
unfavorable nm»iig the top quartlle of students, some people might assume 
that select students, variously defined, have also lost ground relative to 
other students. 1^4/ This seemti not to be the case, however. While some 
data show compmrativoly steep declines among select students, the available 
data as a wholes do not, and the recent upturn appears to have been, if 
anything, partlcta larly striking among some select groups, In addition, in 
mathomaties-an area of particular public concern in recent years-select 
students might hot ve been gaining ground for a considerable time, 

Reports of trends among select students vary markedly, however. 
Some show greater declines than among other groups, while others show less 
marked declines or even no decline at all, This variation probably refiecta 
the diversity botii in criteria used to delineate select students and in the 
tests administyre^i to them, as well a.i the sparseness of the available data. 
For example, the' groups chosen to represent the select include: students 
scoring above specified thresholds on the SAT; students taking more 
selective tests, such as the College Board achievement and advanced 
placement tests; students in the highest ranks of their classes; and students 
taking certainjdv^nced courses (such as high school calculus). 

In addition, limitations of the data seriously cloud comparisons be- 
tween select students and others. Only a few tests have been tabulated in a 
way that permits direct comparison of select and other students, 15/ Those 
that are directly comparable are limited to high school students or, more 
narrowly, to college-bound juniors and seniors. Moreover, many of the tests 
that are designed intentionally for select student8«such as the College 
Board achievememt and advanced placement tests-are optional, and there is 
only limited Infox-rnation about the effects of changes in the test-taking 
groups on average scores. For example, the proportion of students taking 



14, Reports of treads among these studenta have used a variety of terms to label them, 
"Select Etudiftfcs" is used here as a generic term for various groups of the highest* 
achieving students. 

15. Scores on many teats could be tabulated to permit such a comparison, subject to the 
limitations that small sample siies and problems of scaling often would impose on how 
select a groupewild be assessed with confidence. Such reanalysis of the data at tht level 
of individual itmdents, however, is beyond tht scope of this paper. 
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advanced placement tests has changed dramatically In recent years, as hoi 
their geographical distribution and the colleges they subsequently attends. 



The SAT 

Perhaps the moat commonly cited evidence of declining achievement among 
select students is the drop in the proportion of SAT candidates reecelvlng 
very high scores. For example, the proportion receiving score? ower 700 
dropped sharply between 1968 and 1980, particularly on the verbal t*«st (see 
Figure D.l), In 1966, roughly 2.6 percent of SAT candidates obtained verbal 
scores in excess of 700; that percentage had dropped to about 0.8 per-cent 15 
years later. The drop was both more erratic and less severe an the 
mathematics tesMrom roughly 4.1 percent to 2,7 pent, (This parallels 
the fact that the drop in the mean score was Much smaller on the 
mathematics test; nee Chapter 3, Figure III -4,) A lobulation of thsis sort, 
however, cannot be compared directly with the overall decline, which Is 
usually measured in terms of changes in the average scores themselves, 

Trends in the proportion of candidates receiving high SAT secures also 
provide clear evidence that the recent upturn has boon particularly sharp 
among some select students, at least in mathematto, Tin proposrtion of 
SAT-M scores over 700, for example, has risen roughly two-thirdss of the 
way to its 1966 high level, even though it has been riling for only fouar years 
(see Figure D-l). The corresponding Increase in the proportion ofT verbal 
scores over 700, however, has shown far less improvement, 

Two other tabulations of SAT scores that are more directly ecompar- 
able to common measures, of the overall decline yield appareteLtly-but 
perhaps not truly-contradictory information on the relative trends among 
select students. Both tabulations examine changes In the average scores of 
various select groups-rather than the number of stndents scoring above 
certain thresholds-but they use different criteria for categorizing students 
as select and encompass different time periods. 

The first of these tabulations of select SAT scores indicates tlwat from 
1966 to 1975-a period that encompasses the worst of the SAT cladine- 
average scores on the mathematics test declined somewhat less axn-eng the 
high-scoring than among lower-scoring SAT candidates (see Figu=re D4, 
The average score at the 90th percentile declined the least, and scores at 
the 75th and 60th percentiles dropped substantially less than scores at lower 
percentiles. Only in the mid-1970s, however, did the top-scoring grotap show 
a different trend than that of the median SAT candidate. Moreover, no 
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Figure 0-1. 
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similar diffcrcritiatton appeared on the vurl a 1 scale, 16/ Unfortunately* 
no comparable tabulation of SAT scores is available for years after 1975, 

The second tabulation, which only began in 1976 and in which select 
students were included on the basis of self*ropcorted class rank rather than 
SAT scores, shows virtually the opposite patUssrn: more select groups lost 
ground on the SAT verbal test relative to other students (see 
Figure D- 3), 17/ This trend was apparent both during tho last years of the 
decline and during the first few years of the subasequent upturn. In contrast, 
since 1982, the gap between the various grwoups has largely remained 
constant Indeed, this pattern was not limited Qbo select students; across the 
entire range, students with higher class rank tehowed less favorable trends 
than did students with lower class rank, 18/ EScores of students reporting 
themselves to be above the 90th percentile in cl«ss rank fell 16 points on the 
SAT-V between the 1978 and 1982 school years asand only began turning up in 
1083, The pattern among students between the 80th and 90th percentiles is 
quite similar, but tho decline is four points atonaller, and the subsequent 
upturn is clearer and might have begun a few years earlier. In contrast, the 
average scores of tho broad middle of students—those falling between the 
20th and 80th percentiles in class rank**sho\»ved at most a small drop 
between 1975 and 1 979 and have been rising quite steadily since, 

While less favorable trends appeared amon. g students with higher class 
ranks on the SAT mathematics scale as Weill* the mathematics trends 
differed in some respects (see Figure B*3)» As in the case of the verbal 
scale, the widening gap between achievement groups was quite consistent 
across the entire ringe of achievement levells* and the upturn began 
consistently earlier In lewer*ranked groups of students. In the case of 
mathematics! however, the widening of the gap between Mfh- and low- 
achieving groups had ended before the overall rSse In scor^ l igan in 1981, 
and, indeed, the top 10 percent of students grimed a bit relative to others 
during the first years of the score increase. 



16* June 5tern f Selected Percentiles for Scholastic Apilteud* Test Scores (1986*67 through 
1975-76) (New York: The College Board, 1977), 

17. William W. TurnbuII, Changes in SAT Scores: Wh - ctt Can They Teach Us? (College 
Board- ETS Joint Staff Reieareh and Developrneat C&mGtmitteg t forthcoming), Table II, 

18. Although the trend among students below the 20th percentile is largely consistent with 
this generaHEation, It cannot be interpreted with t^onfidence, for it reflects very few 
itudeats-only 0*6 percent of SAT candidates in the 191^3-1984 school year. 
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Figure 

SAT Scores by Percentile of Class Rank (By subject, 
differences from 1975) 

O tfftriHii! from 1976 DlffwtncM from fill 
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Is the apparently steeper SAT decline after 1976 among students with 
btlgh class ranks inconsistent with the comparable or even lesser declines of 
students with high SAT scores in earlier years? Not necessarily. The 
mathematics trends suggest that the upturn might have begun earlier among 
lower-achieving students. If so, it could cause an apparently greater decline 
a:zn©ng select students during the last few years of the decline even if select 
students showed comparable or lesser drops over the entire period of the 
decline. In addition, select students might have declined less on the SAT 
during the earlier and middle years of the decline but more at its end-a 
pattern that could easily arise If the trends reflect a variety of different 
aaiuses. Alternatively, class rank and SAT percentiles might delineate 
different select groups that experienced different trends throughout the 
decline* This possibility is strengthened by the fact that class rank, unlike 
S-AT percentiles, is based on self-reports by students and is therefore subject 
not only to random error, but also to systematic differences in response bias 
among different groups of students* Finally, changes in grading criteria or 
students 1 choices of classes might have altered the meaning of class rank, 
Tfcose students currently ranking in the top 10 percent, for example, might 
bo dissimilar in some respects from those with comparable ranks in 1976, 
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The SAT trends among students divided by class rank also fail to show 
the sharp relative gains in mathematics scores among select students 
evidenced by the dramatic rise in SAT-M scores above 700, but this 
discrepancy might also be more apparent than real The difference suggests 
that the atypically sharp rise in mathematics achievement is limited to a 
more select group of students than those reporting themselves in the top 10 
percent of their classes. Students scoring over 700 are far fewer in number 
than those reporting themselves to be in the top 10 percentile of class ranks; 
even after the recent increases, only 3,6 percent of SAT candidates are in 
the former group, compared with 21.1 percent in the latter, 19/ The former 
group also presumably comprises students who are more select in terms of 
their coursework in mathematics, 



The Illinois Decade Study 

This study suggests that the decline among select students was no worse, 
and perhaps slightly less severe, than that among other students* The 
decline among students at the 95th percentile (that is, those at the cutoff 
for the top 5 percent) was generally similar to that of students at the 76th 
and 50th percentiles, with one exception* on one of two mathematics tests, 
those students at the 96th percentile showed almost no decline, 20/ 



The Iowa Test of Basic Skills 

National norming data from the 1TBS show scores of eighth-grade students 
at the 90th percentile declining considerably less than scores of the median 
student between 1970 and 1977-a period that includes the first year of the 
upturn. On this test, unlike some of the others described here, the relative 
gains of the select students were greater in language-related areas than in 
mathematics. 21/ 



19, The Collage Beard, National Colkg^BoundSmiora (1985), T&Mgg 1 and 7, 

20, On that particular math test, the lowest-scoring students (in this east, these at the 25th 
percentile) declined by as little as thosa at the 95th percentile in absolute terms, while 
these students falling in between declined substantially mere, Illinois State Board 
of Education, Sfucfenf AchUvimmt in Illinois, p, 10, 

21, Hlerenymus, Linqulat, and Hoover* low* T$$t of Basic Skills: Manual for School 
Adminisirators, Table $M* Similar patterns were apparent at many of the other grade 
levels as well, but their interpretation is less clear, since the differences at younger 
ages included more of the period of incrtssing scores. 
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The Second international Assessment of Mathemati 



cs 



A recent international assessment of mathematics achievement suggests 
that select American students-in this case, those taking calculus while in 
fiS i ooo ■ lm P roved s » mathematics. This assessment, carried out in 
1J81-1982 in a national sample of American schools, included testing of 
seniors in calculus and pre-caleulus classes-together, about 10 percent to 
12 percent of seniors. The performance of this group was slightly superior 
to that of comparable students in a similar international assessment 17 
years earlier (based on items included in both assessments), although it was 
still quite poor by international standards. This improvement appears to 
have been far stronger among the students in the calculus classes. 22/ 

The College Board Achievement Tests 

These tests of achievement in specific subject areas are taken by a small 
fraction of 1 percent to about 10 percent of graduates, depending "on the 
subject area and year. Typically, they showed stability or slight increases 
during the last half of the period of declining achievement, but this might 
merely reflect a rapid drop in the proportion of graduates taking the tests 
(see Figure D-4). 23/ That is, if the declining proportion of graduates taking 

22, F Joe Crosswhite, John A. Dessey, Jane 0. Swaflbrd, Curtis C. MeKnight, Thomas J. 
Cooney, and Kenneth J. Travers, Second International Mathematics Study- Summary 

?<?M £t n St » ei (Cham P a, 8». Stipes Publishing Co.. 108B), pp. S. 

70-73. Details of the earlier assessment can be found in Torsten Husen, ed.jntemalhnal 
?2m of i eh ™* mtn \ *n Mathematics: A Comparison of Twelve Countries (Stockholm 
and New York: Almqvist & Wiksell and John Wiley & Sons, 1967). 

23. Because of scaling the drop in the proportion of students taking the achievement tests 
is more marked than it ought seem in FigureD-4. For example, the proportion of 
students taking the biology test dropped by about 22 percent between 1971 and 1979 
but that decline appears moderate in Figure D-4. 

Test score data are from College Board, National College-Bound Seniors, various years 
comparable data on scores and participation rates are unavailable before the 1971 school 
year. Participation rates are obtained by the dividing the number of test takers in a 
given year by the number of high-school graduates in that year in Projections of 
ISS* 0 * Sffi**!" m ,° ' 91 - (W « hIn ^ B . ^ National Center for Educatio/ 
till™ K ^ 68 5 8llght over * iti,n * ta "f the proportion of graduates 
taking the test, because some students take the test in their junior year and repeat it 

£ iwlk 5f y !fT M °? ? mt t ?*' U 0VW8t * t « selectivity of the tests in areas 
in which the SAT and the Achievement Teats are heavily used. It adds to the 
denominator students in areas where few students take these tests (for example areas 
in which the ACT is the dominant college-entrance test). examp,e, areas 
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Fiflure D4, 
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the testa reflects a drop in the number of less able students taking the 
tests, the resulting ineren.se in the ability level of the remaining group 
taking the tests might have masked a decline within ability groups For 
example, the proportion of graduates taking the English composition test 
dropped roughly from 10 to 6 percent between the 1971 and 1978 school 
years, and similar declines in participation occurred in other subject areas 
as well, J 

Conversely, the relative stability of many of the College Board 
achievement test scores since 1979 might hide a substantial increase in 
achievement within ability groups. Since 1979, average scores on the more 
common College Board achievement tests have generally held stable or 
increased modestly in the face of moderate-to-iarge increases in the 
proportion of students tested. (American History is an exception; it showed 
a large increase in average scores but a slight decrease in participation.) 



The College Board's Advanced Placement Tests 



Average scares on this set of tests-taken by college-bound students seeking 
college credit for advanced coursework in high school-has remained stable 
since 1969. This stability, however, might mask a sizable Increase in 
educational accomplishment. 

Relatively few graduates take each of the Advanced Placement (AP) 
tests, but the total proportion taking any of them has roughly tripled-from 
under 2 percent to about 6 percent-over the past decade. 24/ During this 
decade of rapid growth-as well as the preceding half-decade of fairly stable 
test volume-the average score on AP tests in all subjects remained quite 
stable, increasing about 5 percent (see Figure D • 6), 

# The rapid growth in the proportion of seniors taking the AP tests need 
not indicate the sort of compositional changes that affected the SAT in the 
1960s, and the stability of AP scores accordingly should be interpreted 
differently. In the ease of the SAT, the growth in the proportion of students 
taking the test in part indicated an increase in the proportion of test takers 
from lower-ability groups. In such a situation, a stable overall average 
icore would indicate increasing achievement within ability groups. In the 

54. Data are from published and unpublished College Board tabulations. These proportions 
are subject to the same caveats as are deicribtd above with respect to the College Board 
achievement tests. 
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Figure 0-5, , 
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ease of the AP tests, hewiver, much of the growth in volume reflects the 
expansion of the AP program into additional geographic areas, as additional 
universities decided to offer credit for AP tests and more school districts 
and individual schools decided to offer advance courses preparu^ students 
for the AP tests. For example, the decision of some large state universities 
to offer AP courses contributed substantially to the growth of the AP 
program, and students going to such universities-such as the University of 
California, the University of North Carolina, and the State University of 
New York -now account for a large share of the total number of AP 
examinations, 25/ Thus, the growing proportion of students taking AP 
exams might be lowering the average ability of the test-taking group, but 



25. College Entrance Examination Board, unpublished tabulations; and Harlan Hanson, 
The College Board, personal communication, March 1985. 
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grobably fur lose than did the growth of the SAT pool two decades am 
While some of the new students added to the AP pool might b Towe Tn 
abihty than hose in the smaller pool a decade ago-when more Stive 
schools contributed a greater share of the students-many are lmhlh\t 
comparable In ability, dicing only in geographic Son V2S& 

The constancy of AP scores in the face of rapid growth in the number 
of tesMakers accordingly can be seen as an increase in «h,S 
accomplishment. To the extent that the average aWi y of he ^ S 
have decreased, tha stable scores reflect an increase in the scored obSd 
by students at any given ability level, To the extent that additional students 
of comparable ability have been drawn into the program, the trograml 
growth represents p dnmatic increase in the advaneed-level coSrs^Sk of 



CONCLUSION 



Taken together, the available data provide only spotty and inconsistent 

EST S T thSt f Chi r ment trends h8Ve been datively more avora^fin 
some achievement subgroups than in others. There are some indications of 
relative gams at both ends of the achievement scala-tha is am ng 
students » the lowest quartile and among certain select students Th es f 
signs however, appear limited to certain tests. In addition, if these relative 
gams are not an artifact of certain aspects of those particular tests some 
apparently might be confined to relatively short periods.; 

Indeed, the data suggest that generalizations about relative wains in 
various achievement subgroups are risky, and that inferences fo edufa^al 
policy might not be warranted. The variation in trends from one data souree 
to another- -and even from one tabulation to another of a single ultl 

trfdr^ ^ ^° re Striki u g than any Sacralizations about relative 
trends among achievement subgroups. The uncertainty engendered by this 

S± f lS K, eXaCerb f e f , by th6 many ga P fl ln the available data and by 
echnical problems entailed in using the data in their current form to draw 

conclusions about achievement subgroups. 
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DIFFERENCES IN ACHIEVEMENT TRENDS AMONG 
BLACK, HISPANIC, AND NONMINORITY STUDENTS 



Evidence that the average scores of black and Hispanic students have 
risen relative to those of nonminnrity students- -but remain well below 
them — is summarised in Chap- 1 IV. Because that conclusion has 
considerable importance, the evidence underlying it is presented in more 
detail in this appendix. 1/ 



BLACK STUDENTS 



Although data on differences in achievement between black and non- 
minority students at any one time are abundant, data sources showing 
relative trends in achievement in those two groups are surprisingly rare, In 
the course of this study* nine data sources with separate trend data for 
black and nonminority students were lorated. Two are nationally represen- 
tative: the National Assessment of Educational Progress (NAEP), and a 
comparison of the National Longitudinal Study of the High School Senior 
Class of 1972 (NLS) and the High School and Beyond study (HSBJ.2/ Two 
others are national but unrepresentative: the Scholastic Aptitude Test 
(SAT) and American College Testing Program (ACT) tests, Data are also 
available from two statewide assessments (North Carolina and Texas) and 
three local districts (Houston, TeKas; Cleveland, Ohio; and Montgomery 
County, Maryland), 



1. For an explanation of the ethnic classifications used in this paper, see Chapter IV, The 
classifications used in the data sources cited here are not entirely consistent* In each 
ca^ t the scores of black students have been compared with the group which comes closest 
to being "nonmin0rity M * *that is, the group that excludes the largest share of identified 
minority groups. This nonminority group, however, varies among data sources. The 
SAT M white" category, for example, specifically excludes Asian Americans, native 
Americans, Puerto Ricans p and Mexican Americans, In contrast* the closest comparable 
category la the NLS/HSB comparison combines non-Hispanic whites with 
Asian - American and native American students, 

2, Donald A, Rock, Ruth B, Ekstrom, Margaret E, Goertz, Thomas L* Hilton, and Judith 
Pollack, Factors As$ociaied with Decline of fast Score$ of High School Smior$ f 1972-1980 
(Washington, P.C~ Center for Statistics, U,S* Department of Education, 1985), 
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Eight of these nine data sources showed ft consistent and unambiguous 
narrowing of the gap between black and nonminority students, leaving little 
doubt that this pattern is real and not an artifact of some aspects of the 
tests or groups tested. The one partial exception is the ACT. That test did 
show^ a small narrowing of the gap, but the evidence is somewhat 
questionable because of inconsistencies among subject areas and large year- 
to-year fluctuations. While the reasons for that one partial anomaly are not 
clear (several possible explanations are discussed below), it is not sufficient 
to call the convergence of scores on all of the other eight tests into serious 
doubt. The consistency among the other eight tests is particularly 
persuasive in the light of the variation in grade levels, test characteristics, 
and student characteristics from one test to another* 

This convergence in ths scores of black and nonminority students 
Appears to have three components. The scores of black students: 

o Declined less that those of nonminority students during the later 
years of the general decline; 

o Stopped declining, or began increasing again, earlier; and 

o Rose at a faster rate after the general upturn in achievement 
began, 

These specific conclusions, however, are less certain than is the overall 
convergence between the two groups, for not all are apparent in all eight of 
the data sources. 



The SAT 

Since 1975, black students have gained relative to nonminority students on 
both scales of the SAT (see Figure E-l)-a trend that ended with the 1981 
and 1983 school years (on the verbal and mathematics scales, respectively). 
During the late 1970s, while nonminority students continued to lose ground, 
black students improved their scores on the mathematics scale and held 
about constant on the verbal scale, During the first years of the overall 
upturn in scores, blacks gained more rapidly than nonminority students. 

Both the size of the gap and the rate at which it has been shrinking 
can be gauged by comparing the average SAT scores of black students with 
the distribution of scores of nonminority students, In 1975, the average 
black student's score corresponded roughly to the 11th and 12th percentiles 
among nonminority students on the mathematics and verbal scales, 
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Figure 6-1. , 
Minority/Nonminority Differences on the SAT (In standard 

deviations, by subject) 
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respectively. In 1984, the average black scores had risen to about the 16th 
percentile among nonminority scores on both scales. 3/ While this change 
might appear slight, the annual rate of change is in fact roughly comparable 
to the average rate of the total SAT decline : -a trend that few would label 
insignificant. 4/ 



These estimates are based on nonminority (white) withln-group standard d*£«tiou 
in 1983-1984 reported in Solomon Arbeiter. Profit*,. Collw-Bound S«ior». i JW^iw 
York: The College Board, 1984), p. 81. Although the within-group standard delation 
is technically the appropriate index in a comparison of this sort, using the more 
commonTavallable total standard deviation does not substantially alter the resuha. 
Moreover the standard deviations of most tests have changed only very slowly, so the 
choice of a year from which to take a standard deviation is largely immaterial. 

During the total period of decline, average SAT verbal and mathematiei scores declined 
at annual rates of 0.028 and 0.018 standard dsvia'ions per year, respectively. During 
St past nine years (the only period for which ^r^ftS^F^S^SA 
and nonminority students his shrunk at annual rate, of 0.017 and ™tt .Undjns 
deviations per year on the verbal and mathematics scales, respectively (baaed on 1988- 
1984 standard deviations in the total SAT sample). 
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Thfi ACT 



mnce lJTO but that go.n has been amail and I, overshadowed by large year- 
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The NLS and HSB 
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at6g0ry as "^minority students. In all three subject fcsted. -readTg! 
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Figure E-2. 

Biaok/Nonblack 
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SOURCES: CBO calculations based on; American College Testing Program, unpublished and undated tabulations* 
American College Testing Program, "Overview of Selected Results' (ACT, unpublished and undated 
mater ialh Jackie Woods, ACT, personal communication, December IMS, 



vocabulary, and mathematics- •the largest average declines occurred among 
a group comprising non-Hispanic whites, Asians, and American Indians (but 
dominated by the far more numerous non-Hispanic whites.) Trends among 
black students ranged from a small gain in mathematics to a larger but 
modest decline in reading (see Table E* 1), 7/ 



7, Neat of these changes In the average scores of black students was statistically 
significantly different from no change. See Rock and others* Factors Associated with 
Test Score Bedim, Tables D-l,D-2, and 0*3. 
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TABLE E.l. AVERAGE ACHIEVEMENT OF BLACK 
AND OTHER STUDENTS IN THE NLS 
AND HSB, BY SUBJECT 



Category 


1972 


1980 


Change 


Vocabulary 








Black 


3,28 


3.20 


-0.08 


Other a/ 


7,04 


6.22 


-0.82&', 


Reading 








Black 


5.94 


5,56 


-0.38 


Other a/ 


10,51 


9,57 


-0.94b/ 


Mathematics 








Black 


0,50 


6.69 


0.19 


Other a/ 


13, JO 


12,97 


-0.93b/ 



SOURCE: Rock and others, Factori AsBociaied with Bgeltm of Test ScoreB, Tables D^l, 



a. "Other" eattgory Includai non-Hlspaaie whites* Asian Americans, and American 
Indians, 

b t Statistically significant at the ,05 level or lesi, 



The National Assessment of Educational Progress (NAEP) 

The gap between black and nonminority students alio narrowed at all three 
ages tested in the NAEP (see Tables E-2 and E-3). Moreover, this narrowing 
appeared quite consistently in both the top and bottom achievement 
quartiles (see Table D-l in Appendix D), In some cases, both groups lost 
ground, but nonminority students lost more; in others, both blacks and 
nonminority students gained, but blacks gained more, In some instances, 
black scores increased while the nonminority average declined, Although 
not presented in detail here, NAEP assessments in the areas of social studies 
and writing also showed a narrowing of the gap among 9* and 13-year*oIds t 
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TABLE E-2. READING PERFORMANCE OF BLACK AND 

NONMINORITY STUDENTS IN THE NATIONAL 
ASSESSMENT (Average percent of items 
answered correctly nnd proficiency scores) 













Change 




1970 


1974 


1979 


1983 


1970.1979 






Percent Correct 






Age 9 

Nonminority a/ 

Black 


66.4 
49.7 


67.0 
54.5 


69.3 
59.6 


NA 
NA 


2.8 
9.9 


Age LI 

Nonminority a/ 
Black 


62.6 
45.4 


61,9 
46.5 


62.6 
49,6 


NA 
NA 


,0 
4.2 


Age 17 

Nonminority aJ 
Black 


71.2 
51.7 


71.2 
52.1 


70.6 
52,2 


NA 
NA 


-0.7 
0,5 




Proficiency Scores 






Age 9 
Nonminority b/ 
Black 


214.4 
169,3 


215.9 
181.9 


219.7 
188,9 


220.1 
188.4 


5,7 
19,1 


Age 13 

Nonminority b/ 
Black 


260.1 
220.3 


260,9 
224.4 


263.1 
231.9 


263,4 
236.8 


3.3 
16,5 


Age 17 

Nonminority b/ 
Black 


290.4 
240.6 


290,7 
244,0 


291,0 
246.1 


294.6 
263.5 


4.2 
22,9 



SOURCES* National Assessment of Educational Progress, Three National AsBeBsment$ 
of Reading; Change$ in Performance, 1970 4980 (Denvfir* NAEP/Edu€atien 
Commission of the States, 1981), Tables A*l» AB f and A*9, and The Reading 
Report Card: Progress Toward Excellence in Our Schools (Ffineaton* 
NAEP/Edueational Testing St rvlee t 1985), Data Appendix, 



NOTE: NA denotes not available. 



a. 
b. 



Includes Bisp anies in all years, Set footnote 9, 
Ineludai Hispanies in 1970 only, See footnote 10, 
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I ABLE B-3. MATHEMATICS PERFORMANCE OF BLACK AND 
NONMINORITY STUDENTS IN THE 
NATIONAL ASSESSMENT oJ 
(Average percentage of items answered correctly) 



Group 



Age 9 

Nonminority 
Black 

Age 13 

Nonminority 
Black 

Age 17 



1972 

(Estimated) b/ 1977 



80.1 
40.2 



62.3 
41.1 



58.1 
43 , 1 



59.9 
41.7 



68.8 
45.2 



63.1 
48.2 



Change 
1981 1972-1981 



-1.28 
4.99 



0.84 
7.07 



Nonmiriority 
Black 



66,7 
48.3 



63.2 
43.7 



83.1 
45.0 



■S.Cd 
1.32 



$OURCE w> £BO Mlculations based on National Assessment of Educational Progress, 

The Third hational Mathematics Assessment; Results, Trends, and Issues 
j ^oU tlo ° Coinmi 9 sl oo of the States, 1988), Table 5 1- 

and CBO cakulationB based on National Assessment of Educational 
wfS?^ Mathematical Technical Report! Summary Volume (Denver: 
NAEP/Bducational Commission of the States, 1980), Tables 2, 3, and 4. 

a. Nonminority category excludes Hispanici in all years. 

b. These estimates for 1972 differ from published NAEP results for the 1972 assessment 
The published results for that year are based either on the 1972 item pool or on the items 
used ,n both 1972 and 1977, while the trend results comparing L 1977 and 1981 
assessments reflect items used in both the 1977 and 1981 assessments, In order to 
circumvent the large disparities in the item seta, 1972 results were estimated here by 
adjusting the 1977 results (on the items used in 1977 and 1981) by the 1972-10-1977 
change (on the items used in 1972 and 1977). 
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TRENDS AMONG W.AC3K, HISPANIC, AND NONMINOftlTY OTUDKNT8 1S7 



On the other hand, in science, no clear narrowing of the gap was 
apparent 8/ 

The NAEP provides a somewhat different view than the SAT of the 
magnitude of the achievement gap between black and nonminority students 
and of the rate at which that difference is shrinking* The NAEP, in contrast 
to the SAT, is designed to assess the degree to which students have 
mastered commonly taught material Moreover, until recently* the NAEP 
was scaled in a way that is intuitively elearer-albeit less useful in some 
important respeets-than the SAT; scores are typically presented as the 
average percent of items answered correctly by a given group of students, 
In the early 1970s f black students on average correctly answered about a 
third fewer items in math and a fourth fewer in reading than did their 
nonminority peers, 9/ For example* nonminority nine-year^ldii averaged 60 
items correct in mathematics, compared with about 40 items answered 
correctly by the average black student In proportional terms, these 
differences were quite similar in all three age groups tested. 

Throughout the 1970s, differences between black and nonminority 
students in NAEP scores shrank more rapidly among elementary and junior- 
high students than among high school students. Among nine*year-olds y the 
average black student's mathematics score was roughly a fourth below the 
average nonminority score in 1981, compared with a third below in 1972. In 
reading, the average black score went irom a fourth below the 



8. See Nancy W, Burton and Lyle V, Jonas* "Recent Trends in Achievement Levels of Black 
and Whit© Youth," Educational Researcher, v©L 11 (April 1982), pp, 10*14, 17, Burton 
and Jones suggest that the racial gap hat narrowed In science as well, but that change 
appears largely to be an artifact of differences in the content of the tests given in different 
pairs of years, When the 19724976 change la racial differences on the item sat 
administered in both of those years is added to the 19694972 change on the set used 
in bath of those years, the trend in the racial difference over the entire period considered 
is nearly iero. This can be seen from their Figures 4 and 5 and, more precisely, from 
Tables A-2, A*3, and A-4 is National Assessment of Educational Progress, Three National 
Assessments of Science. 

9, In these reading data, Hispanlcs are included in the nonminority category (National 
Assessment of Educational Progress, Three Naiianal AsseBsmenU of Reading, p, 2), 
While including Hispanics in the nonminority category lowers the average score of 
that group, its effect on the trends is unclear, On the one hand, the relative gains of 
Hispanic students during that period- -described subsequently- - would make the trends 
In tiie nonminority group more favorable and thus attenuate the comparative gains 
among blacks, Oa the other hand, the growth of the Hispanic share of the school-age 
population would make trends in the nonminority group less favorable and thus 
exaggerate the relative gains of blacks. In contrast, in the mathematics data, HUpasia 
students are separated (National Assessment of Educational Progress, Changes in 
Mathematical Achieving 
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nonminority score in 1970 to Ubb than 16 percent below In 1979, The gap 
narrowed slightly less among 13-year-olds and very little among 17-year* 
olds, 

In the most recent (1983) reading assessment, NAEP scores are 
reported In terms of "proficiency scores" that permit comparison of the 
performance of students In different age groups-providing yet another way 
of gauging the gap between black and non-minority students, Through the 
1979 assessment, these data reveal the same pattern noted above, with one 
addition-- through 1979, black 17-year-olds were on average less proficient 
in reading than nonminorlty 13-year-olds (see Figure IV- 5 in Chapter IV). 10/ 

Since 1979, these new NAEP data indicate that the closing of the gap 
between black and nonminority students accelerated among 17-year-olds 
while ending among nine»year*olds, (Because of the large gains among black 
17-year-olds, the average performance in the groups reached the level of 
the average among nonminority 13-year-olds for the first time,) This 
pattern makes sense in terms of a cohort model; in both age groups, the 
black students born in the mid*1960s contributed the most marked gains (see 
Figure IV«5 in Chapter IV), On the other hand, these trends among 17-year* 
olds are inconsistent with the SAT data, which show the relative gains of 
black students ending in the last few years* 



State « Level Data 

Statewide assessments from two states, North Carolina and Texas, provide 
trend data separately for black and nonminority students, and both show a 
narrowing of the gap between the two groups. The North Carolina statewide 
assessment program provides average scores of black and white students on 
a standardised achievement test (the CAT) since 1977, In all three grades 
tested (3, 6, and 9), the gap has narrowed considerably (see Figure E-3),JU/ 



10, In these tabulations, Hispanies are included in the white (or nonminority) category 
only in 1070 (National Assessment of Educational Prog re ss, Th§ Heading Report Card, 
Data Appendix)* Their being included only In the base year and excluded thereafter 
exaggerated the improvement among whites, thus attenuating the relative gains of 
black students. 

1L The trends in Figure E -3 were calculated using the total standard deviation from the 
1077 nerming sample for the California Achievement Teats (California Achievement 
Teiis, Form§ C and D t Technical Bulletin 2 (Monterey! CTB/MeGraw*Hill # 1078), 
Table 8Mf standard deviations based on the North Carolina data were available, their 
use would have altered the specific numbers in Figure E-3, but the differences mcst 
likely would have been relatively small, and the convergence of black and white students* 
scores would still be apparent, 
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SOURCES: CBO calculations bmmi on North Carolina State Department of Education, unpublished tabulations, and 



Cafitainiff Achievement Trntu, Form C and 0: Technical Bulletin f (Monterey; CTB/McGraw Hill, 1979)* 



Black ninth-grade students have also improved their average achievement on 
the Texas statewide mathematics and reading tests more rapidly than have 
nonminority students during the few years for which data are available (see 
Figure B-4), 12/ 



HISPANIC STUDENTS 



As noted in Chapter IV, trend data about Hispanic students are sparser than 
those about black students^ and their meaning is clouded by inconsistencies 



12, The Texas scores era tabulated as percentages of students in each group exceeding a 
specific criterion score. Since the proportion of white students txeeeding the criterion 
is very high, the convergence of the scores of black and nonminority scores may in part 
reflect a ''ceiling ' elftctMhat is, the fact that the success rate among nonminority 
students cannot rise much more. Even after a mathematical correction of this problem 
(normalising the proportions with alogit transformation), however, the gap appears 
to be narrowing appreciably, albeit at a slower rate than in the unadjusted data presented 
in Figure E -4, 
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Figure E-4, 

Percentages of Grade-Nine Texas Students Passing 
Mathematics and Reading Tests, for Three Ethnic Groups 



Ptfcsnt 

100 



00 



60 



40- 



Mathematics 



White 



Hispatv 



100 



1380 



1981 



1983 

Tei! Y§if 



1983 



1984 1980 




1882 

Teit Yiir 



1984 



SOURCE: W James Popham, Keith L. Cruse, Stuart C. Rankin. Paul D Sandlfw. and Paul L Williams 

Measurement Driven Instruction; it's on the Road." Ph, Delia Kapfmn, vol. 66 (May IMS), pp. 628 634, 



in the categorization of Hispanics and differences among various Hispanic 
groups. In addition, the small number of Hispanic students in many sources 
of data leads to instability and unreliability in estimates of trends within 
that group.- a problem that is exacerbated when the scores of Hispanic 
students are reported separately for different Hispanic groups, such as 
Mexican Americans and Puerto Ricans.13/ Given that unreliability, consis- 
tency ot the trends among a variety of tests is particularly important, 

Of the five data sources used in this report that provided trend data on 
Hispanic students, all but one showed a clear narrowing of the gap between 
nonminority students and at least one Hispanic group. The sole exception is 
local data from the Montgomery County (Maryland), public schools, which 
showed slight and not entirely consistent increases in the size of the 
gap. 14/ 



13, 



Average scores of various Hispanic subgroups could be pooled, but the differences in 
both achievement levels and recent trends among these groupe- -documented in this 
Appendix • -argue ag amst that approach whin separate tabulations are available, 

14, Montgomery County (Maryland) Public Schools, "MCPS Test Results by Racial/Ethnic 
Groups, 197?. 1882" (unpublished, 1982), 
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The SAT 

College Board data distinguish between two Hispanic groups; Mexican 
Americans and Puerto Ricans, 

The narrowing of the gap between Mexican-American and nonminority 
students has been fairly consistent since the first year of data and appears 
on both scales (see Figure E*l). Over the full nine years of data, the 
convergence of scores between Mexican*American and nonminority students 
is 75 percent or 80 percent as great as that between blacks and non-minority 
students- As in the case of blacky the convergence was a bit greater on the 
mathematics scale than on the verbal scale. The trend among Mexican 
Americans also parallels that among blacks, in that the relative gains appear 
to have ended or tapered off in the pnst few years* The year*to*year 
fluctuations in the Mexican-American students* scores, however, call this 
short-term pattern into question. 

Puerto Rican students also showed gains relative to nonminority 
students, but in this case, the gains were both small and far less consistent 
from year to year, perhaps partly because of the relatively small number of 
Puerto Rican students taking the SAT (see Figure E*l)* The relative gains 
of Puerto Rican students parallel those of blacks and Mexican Americans in 
being greater in mathematics than on the verbal scale. On both scales, 
however, their relative gains were only about 40 percent as large as those of 
black students over the full nine years* 



The NLS and HSB 

The NLS/HSB comparison shows relative gains among both Mexican- 
American and other Hispanic students in all, three subjects tested (reading, 
vocabulary* and mathematics), with Mexican*American students showing a 
larger relative gain in vocabulary (see Table B-4)* With the exception of the 
vocabulary gains by Mexican Americans, the relative gains of Hispanies 
were much smaller than those of black students. All of these patterns, 
however, are open to questions because the Hispanic sample sizes are small. 
For that reason, even fairly striking changes are not significantly 
different* * in a statistical sense* *from no change. 



The National Assessment of Educational Progress 

The NAEP data show an entirely consistent pattern of relative fains by 
Hispanic students (not further separated into subgroups) in both reading and; 
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TABLE E-4. AVERAGE ACHIEVEMENT OP HISPANIC 
AND OTHER STUDENTS IN THE 
NLS AND HSB, BY SUBJECT 



Group 


1972 


1 QRfi 


£ * L* *«« w^k ^afc iz^ 

unange 




Vocabulary 






Mexican American 


3 47 


O * DU 


A HQ 

U i Ud 


Other Hispanic 






*U , bo 


Other a/ 


7,04 


6,22 


-0.82^ 




Reading 






Mexican American 


8,28 


5.60 


-0.69 


Other Hispanic 


8,49 


5.72 


-0.77 


Other a/ 


10,51 


9.57 


-0.94^ 




Mathematics 






Mexican American 


8.02 


7,54 


-0.48 


Other Hispanic 


7.48 


7.90 


-0.41 


Other a/ 


13,90 


12.97 


-0.93b/ 



SOURCE: Rock and others. Factors Associated with Decline of Test Scores Tables D^l 
D»2 f andD»3 ( 



NOTE* Components might not sum to totals because of rounding, 

a, "Other" category includes non- Hispanic whites, Asian Americans, and American 
Indians. 

b, Statistically significant at the M level or less. 



175 



AppondU* B 



mathematics- 4ho only subjects for which such comparisons have been made 
available (see Tables E-6 and These relative gains are apparent in all 

three age groups and during periods of both increasing and decreasing 
scores. They are generally* but not in every erne, smaller than those of 
black students* 16/ 



The Texas State Assessment 

The data from the Texas assessment of mathematics and reading 
achievement of ninth*grade students is consistent with the other data 
reported here, Hispanic students on average scored between black and lion- 
minority students, although closer to black students, Moreover, like bleck 

students, they gained relative to the nonminority average (see Figure 



15* Note that in reading, the relevant comparison is the change is blacks scores from 1974 
to iSSfls not the change from 1970 tha J is tabulated in Table Scores for Hiipanics 
are not available from the 1970 assess0ii.1t. 
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TABLE E- 6, MATHEMATICS PERFORMANCE OF 

NONMINOR1TV AND HISPANIC STUDENTS 
IN THE NATIONAL ASSESSMENTS 
(Average percentage of items answered correctly) al 



1972 
(Estimated) bt 



Age 9 

Nonminority a/ 

Hispanic 

Age 13 

Nonminority a/ 
Hispanic 

Age 17 



60.1 

46.1 



62.8 
48.4 



1977 



58.1 

46.6 



59.9 
43.4 



1981 



58,8 
47 . 7 



63.1 
51.9 



Change 
1972-1981 



-1.28 

1.60 



0.84 
3.52 



Nonminority a/ 
Hispanic 



66,7 
50.8 



83.2 
48.5 



63.1 
49.4 



■3.56 
■1.42 



SOURCE; CBO calculations based on National Assessment of Educational Progress, 

The Third Natio- al Mathematics Assessment: Re&ults, Trends, and Issues, 
Table 8.1; and Mathematical Technical Report: Summary Volume Tables 
2,3,and4, 

Nonminority is non-Hispanic white, labeled "white" in the cited sources. 

These estimates for 1972 differ from published NAEP results for the 1972 assessment. 
The published results for that year are based either on the 1972 item pool or on the items 
used in both 1972 and 1977, while the trend results comparing the 1977 and 1981 
assessments reflect items used in both the 1977 and 1981 assessments. In order to 
circumvent the large disparities in the item sets, 1972 results were estimated here by 
adjusting the 1977 results (on the items used in 1977 and 1981) by the 1972.te-1977 
change (on the items used in 1972 and 1977). 



a. 

b. 
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table e* 6. reading performance of nonminority 
and hispanic students in the 
national assessments 

(Average proficiency scores) 



1974 



1979 



1983 



Change 
1974.1983 



Age 9 



Nonminority a/ 
Hispanic 



215,9 

182,9 



219.7 

189,1 



220.1 

J 93,0 



4.2 

10. 1 



Age 13 



Nonminority a/ 
Hispanic 



260.9 
231 .1 



263.1 
236,0 



263.4 
239.2 



2.5 
8,1 



Age 17 



Nonminority a/ 
Hispanic 



290,7 
264.7 



291.0 
261.7 



294.6 
268.7 



3.9 
14.0 



SOURCE: 



National Assessment of Educational Progress: The Heading Report Card; 
Progress Toward Excellence in our Schools, DaU Appendix, 



a, Nonminority ii non -Hispanic white, labeled "white" in the cited source. 
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